一、前言
上一節我們對Python編譯及反匯編做了講解,大家知道dis模塊可以將編譯好的pyc文件中提取出來的PyCodeObject反匯編為可以閱讀字節碼形式。本節我們對dis模塊中的源碼進行詳細的解讀。
二、dis模塊原理解析
官方文檔說明:https://docs.python.org/2/library/dis.html
The dis module supports the analysis of CPython bytecode by disassembling it. The CPython bytecode which this module takes as an input is defined in the file Include/opcode.h and used by the compiler and the interpreter.
dis模塊通過反匯編來支持對python字節碼形式的分析。dis模塊可以將編譯好的二進制數據或者python源碼當作模塊的輸入源。
dis模塊可以將python源碼文件、內存中的類或者方法、或者經過反序列化的PyCodeObject翻譯為相應的字節碼供分析。
2.1、dis反匯編源碼文件:
將源碼文件作為dis模塊的輸入,dis模塊將直接輸入該源碼文件編譯后對應的字節碼文本。
2.2、dis反匯編內存中的類或者函數:
將內存中的類、函數,甚至時普通的變量作為參數傳遞給dis模塊中的dis函數,也可以返回該類對應的編譯后的字節碼形式。
2.3、dis反匯編PyCodeObject對象:
這一類情況是我們在做python逆向或者pyc文件分析時常用到的形式。
2.4、dis無參數:
如果dis.dis無參數傳入,該方法默認會返回當前python shell上次報錯時堆棧中儲存的內存信息的字節碼形式。
三、dis模塊解讀
dis模塊包含許多類和方法,具體用法如下表:
方法或者屬性 | 說明 |
---|---|
dis.dis([bytesource]) | Disassemble the bytesource object. bytesource can denote either a module, a class, a method, a function, or a code object. For a module, it disassembles all functions. For a class, it disassembles all methods. For a single code sequence, it prints one line per bytecode instruction. If no object is provided, it disassembles the last traceback. |
dis.distb([tb]) | Disassembles the top-of-stack function of a traceback, using the last traceback if none was passed. The instruction causing the exception is indicated. |
dis.disassemble(code[, lasti]) | Disassembles a code object, indicating the last instruction if lasti was provided. |
dis.disco(code[, lasti]) | A synonym for disassemble(). It is more convenient to type, and kept for compatibility with earlier Python releases. |
dis.findlinestarts(code) | This generator function uses the co_firstlineno and co_lnotab attributes of the code object code to find the offsets which are starts of lines in the source code. They are generated as (offset, lineno) pairs. |
dis.findlabels(code) | Detect all offsets in the code object code which are jump targets, and return a list of these offsets. |
dis.opname | Sequence of operation names, indexable using the bytecode. |
dis.opmap | Dictionary mapping operation names to bytecodes. |
dis.cmp_op | Sequence of all compare operation names. |
dis.hasconst | Sequence of bytecodes that access a constant. |
dis.hasfree | Sequence of bytecodes that access a free variable. |
dis.hasname | Sequence of bytecodes that access an attribute by name. |
dis.hasjrel | Sequence of bytecodes that have a relative jump target. |
dis.hasjabs | Sequence of bytecodes that have an absolute jump target. |
dis.haslocal | Sequence of bytecodes that access a local variable. |
dis.hascompare | Sequence of bytecodes of Boolean operations. |
上表摘自官方文檔整理,對各個方法及屬性進行了詳細的說明。下文將對dis模塊運行流程進行說明。
3.1
dis模塊主函數為dis,所有對dis模塊的調用默認都會將參數傳送給dis.dis(不排除進階玩家直接調用dis.disb等其他模塊來完成特定功能)
3.2
dis.dis先進行參數檢查,根據無參數、字典、PyCodeObject實例化對象,代碼段等不同類型參數調用不同的方法。如果提交的參數是字典,dis模塊會通過迭代,將字典中的每個鍵值作為參數傳遞給dis.dis
3.3
經過dis方法的處理,最終參數會被交給disassemble或者disassemble_string方法處理,disassemble方法負責對提交的對象進行反匯編,disassemble_string方法負責對代碼段進行反匯編,因為disassemble_string方法代碼類似於disassemble,不對disassemble_string進行解讀。
3.4
disassemble方法用來將PyCodeObject實例化對象翻譯為可讀字節碼。首先調用findlabels和findlinestarts。findlabels將所有字節碼跳轉指向目的字節碼地址存入堆棧。findlinestarts用來標記字節碼對應的源碼位置,官方注釋說明findlinestarts會生成(offset, lineno)元組,其中offset為字節碼偏移地址,lineno為源碼偏移地址。
3.5
disassemble方法對字節碼代碼部分逐行翻譯,並且添加必要變量及標志注釋。
四、dis模塊源碼注釋版本
"""Disassembler of Python byte code into mnemonics."""
import sys
import types
from opcode import *
from opcode import __all__ as _opcodes_all
__all__ = ["dis", "disassemble", "distb", "disco",
"findlinestarts", "findlabels"] + _opcodes_all
del _opcodes_all
_have_code = (types.MethodType, types.FunctionType, types.CodeType,
types.ClassType, type)
'''根據x所屬type,判斷對輸入參數x執行何種反編譯,其中co_code選項是
對pyc文件中提取的marshal數據進行反編譯過程中常用的'''
def dis(x=None):
"""Disassemble classes, methods, functions, or code.
With no argument, disassemble the last traceback.
"""
if x is None:
distb()
return
if isinstance(x, types.InstanceType):
x = x.__class__
if hasattr(x, 'im_func'):
x = x.im_func
if hasattr(x, 'func_code'):
x = x.func_code
if hasattr(x, '__dict__'):
items = x.__dict__.items()
items.sort()
for name, x1 in items:
if isinstance(x1, _have_code):
print "Disassembly of %s:" % name
try:
dis(x1)
except TypeError, msg:
print "Sorry:", msg
print
elif hasattr(x, 'co_code'):
disassemble(x)
elif isinstance(x, str):
disassemble_string(x)
else:
raise TypeError, \
"don't know how to disassemble %s objects" % \
type(x).__name__
'''無參數x傳入時,對上次報錯的堆棧信息進行反編譯'''
def distb(tb=None):
"""Disassemble a traceback (default: last traceback)."""
if tb is None:
try:
tb = sys.last_traceback
except AttributeError:
raise RuntimeError, "no last traceback to disassemble"
while tb.tb_next: tb = tb.tb_next
disassemble(tb.tb_frame.f_code, tb.tb_lasti)
'''反編譯的主函數'''
def disassemble(co, lasti=-1):
"""Disassemble a code object."""
code = co.co_code
labels = findlabels(code)
linestarts = dict(findlinestarts(co))
n = len(code)
i = 0
'''***'''
extended_arg = 0
free = None
while i < n:
c = code[i]
op = ord(c)
'''字節碼對應源碼偏移量標注'''
if i in linestarts:
if i > 0:
print
print "%3d" % linestarts[i],
else:
print ' ',
if i == lasti: print '-->',
else: print ' ',
'''標注跳轉標記'''
if i in labels: print '>>',
else: print ' ',
'''標注字節碼偏移和opcode名字'''
print repr(i).rjust(4),
print opname[op].ljust(20),
i = i+1
if op >= HAVE_ARGUMENT:
'''根據不同的變量類型進行變量標注'''
oparg = ord(code[i]) + ord(code[i+1])*256 + extended_arg
extended_arg = 0
i = i+2
if op == EXTENDED_ARG:
extended_arg = oparg*65536L
print repr(oparg).rjust(5),
if op in hasconst:
print '(' + repr(co.co_consts[oparg]) + ')',
elif op in hasname:
print '(' + co.co_names[oparg] + ')',
elif op in hasjrel:
print '(to ' + repr(i + oparg) + ')',
elif op in haslocal:
print '(' + co.co_varnames[oparg] + ')',
elif op in hascompare:
print '(' + cmp_op[oparg] + ')',
elif op in hasfree:
if free is None:
free = co.co_cellvars + co.co_freevars
print '(' + free[oparg] + ')',
print
'''字符串反編譯的主函數'''
def disassemble_string(code, lasti=-1, varnames=None, names=None,
constants=None):
labels = findlabels(code)
n = len(code)
i = 0
while i < n:
c = code[i]
op = ord(c)
if i == lasti: print '-->',
else: print ' ',
if i in labels: print '>>',
else: print ' ',
print repr(i).rjust(4),
print opname[op].ljust(15),
i = i+1
if op >= HAVE_ARGUMENT:
oparg = ord(code[i]) + ord(code[i+1])*256
i = i+2
print repr(oparg).rjust(5),
if op in hasconst:
if constants:
print '(' + repr(constants[oparg]) + ')',
else:
print '(%d)'%oparg,
elif op in hasname:
if names is not None:
print '(' + names[oparg] + ')',
else:
print '(%d)'%oparg,
elif op in hasjrel:
print '(to ' + repr(i + oparg) + ')',
elif op in haslocal:
if varnames:
print '(' + varnames[oparg] + ')',
else:
print '(%d)' % oparg,
elif op in hascompare:
print '(' + cmp_op[oparg] + ')',
print
disco = disassemble # XXX For backwards compatibility
'''遍歷尋找co_code中為跳轉操作的opcode,並將跳轉的目的地址(字節碼的偏
移地址)存入labels中'''
def findlabels(code):
"""Detect all offsets in a byte code which are jump targets.
Return the list of offsets.
"""
labels = []
n = len(code)
i = 0
while i < n:
c = code[i]
op = ord(c)
i = i+1
if op >= HAVE_ARGUMENT:
'''計算argv表示的偏移地址'''
oparg = ord(code[i]) + ord(code[i+1])*256
i = i+2
label = -1
'''根據跳轉類型將跳轉后的地址加入數組labels中'''
if op in hasjrel:
label = i+oparg
elif op in hasjabs:
label = oparg
if label >= 0:
if label not in labels:
labels.append(label)
return labels
def findlinestarts(code):
"""Find the offsets in a byte code which are start of lines in the source.
Generate pairs (offset, lineno) as described in Python/compile.c.
"""
'''匯編偏移'''
byte_increments = [ord(c) for c in code.co_lnotab[0::2]]
'''源碼偏移'''
line_increments = [ord(c) for c in code.co_lnotab[1::2]]
'''上一行源碼的絕對地址'''
lastlineno = None
'''當前匯編對應源碼的行'''
lineno = code.co_firstlineno
addr = 0
for byte_incr, line_incr in zip(byte_increments, line_increments):
if byte_incr:
if lineno != lastlineno:
yield (addr, lineno)
lastlineno = lineno
addr += byte_incr
lineno += line_incr
'''byte偏移量一定每次遞增不為零,但是源碼可能出現lambda類似
語句,因此不同區塊的字節碼可能對應於源碼的同一行'''
if lineno != lastlineno:
yield (addr, lineno)
def _test():
"""Simple test program to disassemble a file."""
if sys.argv[1:]:
if sys.argv[2:]:
sys.stderr.write("usage: python dis.py [-|file]\n")
sys.exit(2)
fn = sys.argv[1]
if not fn or fn == "-":
fn = None
else:
fn = None
if fn is None:
f = sys.stdin
else:
f = open(fn)
source = f.read()
if fn is not None:
f.close()
else:
fn = "<stdin>"
code = compile(source, fn, "exec")
dis(code)
if __name__ == "__main__":
_test()