Python逆向(四)—— Python內置模塊dis.py源碼詳解


一、前言

上一節我們對Python編譯及反匯編做了講解,大家知道dis模塊可以將編譯好的pyc文件中提取出來的PyCodeObject反匯編為可以閱讀字節碼形式。本節我們對dis模塊中的源碼進行詳細的解讀。

二、dis模塊原理解析

官方文檔說明:https://docs.python.org/2/library/dis.html
The dis module supports the analysis of CPython bytecode by disassembling it. The CPython bytecode which this module takes as an input is defined in the file Include/opcode.h and used by the compiler and the interpreter.

dis模塊通過反匯編來支持對python字節碼形式的分析。dis模塊可以將編譯好的二進制數據或者python源碼當作模塊的輸入源。

dis模塊可以將python源碼文件、內存中的類或者方法、或者經過反序列化的PyCodeObject翻譯為相應的字節碼供分析。

2.1、dis反匯編源碼文件:

將源碼文件作為dis模塊的輸入,dis模塊將直接輸入該源碼文件編譯后對應的字節碼文本。

2.2、dis反匯編內存中的類或者函數:


將內存中的類、函數,甚至時普通的變量作為參數傳遞給dis模塊中的dis函數,也可以返回該類對應的編譯后的字節碼形式。

2.3、dis反匯編PyCodeObject對象:

這一類情況是我們在做python逆向或者pyc文件分析時常用到的形式。

2.4、dis無參數:

如果dis.dis無參數傳入,該方法默認會返回當前python shell上次報錯時堆棧中儲存的內存信息的字節碼形式。

三、dis模塊解讀

dis模塊包含許多類和方法,具體用法如下表

方法或者屬性 說明
dis.dis([bytesource]) Disassemble the bytesource object. bytesource can denote either a module, a class, a method, a function, or a code object. For a module, it disassembles all functions. For a class, it disassembles all methods. For a single code sequence, it prints one line per bytecode instruction. If no object is provided, it disassembles the last traceback.
dis.distb([tb]) Disassembles the top-of-stack function of a traceback, using the last traceback if none was passed. The instruction causing the exception is indicated.
dis.disassemble(code[, lasti]) Disassembles a code object, indicating the last instruction if lasti was provided.
dis.disco(code[, lasti]) A synonym for disassemble(). It is more convenient to type, and kept for compatibility with earlier Python releases.
dis.findlinestarts(code) This generator function uses the co_firstlineno and co_lnotab attributes of the code object code to find the offsets which are starts of lines in the source code. They are generated as (offset, lineno) pairs.
dis.findlabels(code) Detect all offsets in the code object code which are jump targets, and return a list of these offsets.
dis.opname Sequence of operation names, indexable using the bytecode.
dis.opmap Dictionary mapping operation names to bytecodes.
dis.cmp_op Sequence of all compare operation names.
dis.hasconst Sequence of bytecodes that access a constant.
dis.hasfree Sequence of bytecodes that access a free variable.
dis.hasname Sequence of bytecodes that access an attribute by name.
dis.hasjrel Sequence of bytecodes that have a relative jump target.
dis.hasjabs Sequence of bytecodes that have an absolute jump target.
dis.haslocal Sequence of bytecodes that access a local variable.
dis.hascompare Sequence of bytecodes of Boolean operations.

上表摘自官方文檔整理,對各個方法及屬性進行了詳細的說明。下文將對dis模塊運行流程進行說明。

3.1

dis模塊主函數為dis,所有對dis模塊的調用默認都會將參數傳送給dis.dis(不排除進階玩家直接調用dis.disb等其他模塊來完成特定功能)

3.2

dis.dis先進行參數檢查,根據無參數、字典、PyCodeObject實例化對象,代碼段等不同類型參數調用不同的方法。如果提交的參數是字典,dis模塊會通過迭代,將字典中的每個鍵值作為參數傳遞給dis.dis

3.3

經過dis方法的處理,最終參數會被交給disassemble或者disassemble_string方法處理,disassemble方法負責對提交的對象進行反匯編,disassemble_string方法負責對代碼段進行反匯編,因為disassemble_string方法代碼類似於disassemble,不對disassemble_string進行解讀。

3.4

disassemble方法用來將PyCodeObject實例化對象翻譯為可讀字節碼。首先調用findlabels和findlinestarts。findlabels將所有字節碼跳轉指向目的字節碼地址存入堆棧。findlinestarts用來標記字節碼對應的源碼位置,官方注釋說明findlinestarts會生成(offset, lineno)元組,其中offset為字節碼偏移地址,lineno為源碼偏移地址。

3.5

disassemble方法對字節碼代碼部分逐行翻譯,並且添加必要變量及標志注釋。

四、dis模塊源碼注釋版本

"""Disassembler of Python byte code into mnemonics."""

import sys
import types

from opcode import *
from opcode import __all__ as _opcodes_all

__all__ = ["dis", "disassemble", "distb", "disco",
           "findlinestarts", "findlabels"] + _opcodes_all
del _opcodes_all

_have_code = (types.MethodType, types.FunctionType, types.CodeType,
              types.ClassType, type)

'''根據x所屬type,判斷對輸入參數x執行何種反編譯,其中co_code選項是
對pyc文件中提取的marshal數據進行反編譯過程中常用的'''
def dis(x=None):
    """Disassemble classes, methods, functions, or code.
    With no argument, disassemble the last traceback.
    """
    if x is None:
        distb()
        return
    if isinstance(x, types.InstanceType):
        x = x.__class__
    if hasattr(x, 'im_func'):
        x = x.im_func
    if hasattr(x, 'func_code'):
        x = x.func_code
    if hasattr(x, '__dict__'):
        items = x.__dict__.items()
        items.sort()
        for name, x1 in items:
            if isinstance(x1, _have_code):
                print "Disassembly of %s:" % name
                try:
                    dis(x1)
                except TypeError, msg:
                    print "Sorry:", msg
                print
    elif hasattr(x, 'co_code'):
        disassemble(x)
    elif isinstance(x, str):
        disassemble_string(x)
    else:
        raise TypeError, \
              "don't know how to disassemble %s objects" % \
              type(x).__name__

'''無參數x傳入時,對上次報錯的堆棧信息進行反編譯'''
def distb(tb=None):
    """Disassemble a traceback (default: last traceback)."""
    if tb is None:
        try:
            tb = sys.last_traceback
        except AttributeError:
            raise RuntimeError, "no last traceback to disassemble"
        while tb.tb_next: tb = tb.tb_next
    disassemble(tb.tb_frame.f_code, tb.tb_lasti)

'''反編譯的主函數'''
def disassemble(co, lasti=-1):
    """Disassemble a code object."""
    code = co.co_code
    labels = findlabels(code)
    linestarts = dict(findlinestarts(co))
    n = len(code)
    i = 0
    '''***'''
    extended_arg = 0
    free = None
    while i < n:
        c = code[i]
        op = ord(c)
        '''字節碼對應源碼偏移量標注'''
        if i in linestarts:
            if i > 0:
                print
            print "%3d" % linestarts[i],
        else:
            print '   ',

        if i == lasti: print '-->',
        else: print '   ',
        '''標注跳轉標記'''
        if i in labels: print '>>',
        else: print '  ',
        '''標注字節碼偏移和opcode名字'''
        print repr(i).rjust(4),
        print opname[op].ljust(20),
        i = i+1
        if op >= HAVE_ARGUMENT:
            '''根據不同的變量類型進行變量標注'''
            oparg = ord(code[i]) + ord(code[i+1])*256 + extended_arg
            extended_arg = 0
            i = i+2
            if op == EXTENDED_ARG:
                extended_arg = oparg*65536L
            print repr(oparg).rjust(5),
            if op in hasconst:
                print '(' + repr(co.co_consts[oparg]) + ')',
            elif op in hasname:
                print '(' + co.co_names[oparg] + ')',
            elif op in hasjrel:
                print '(to ' + repr(i + oparg) + ')',
            elif op in haslocal:
                print '(' + co.co_varnames[oparg] + ')',
            elif op in hascompare:
                print '(' + cmp_op[oparg] + ')',
            elif op in hasfree:
                if free is None:
                    free = co.co_cellvars + co.co_freevars
                print '(' + free[oparg] + ')',
        print

'''字符串反編譯的主函數'''
def disassemble_string(code, lasti=-1, varnames=None, names=None,
                       constants=None):
    labels = findlabels(code)
    n = len(code)
    i = 0
    while i < n:
        c = code[i]
        op = ord(c)
        if i == lasti: print '-->',
        else: print '   ',
        if i in labels: print '>>',
        else: print '  ',
        print repr(i).rjust(4),
        print opname[op].ljust(15),
        i = i+1
        if op >= HAVE_ARGUMENT:
            oparg = ord(code[i]) + ord(code[i+1])*256
            i = i+2
            print repr(oparg).rjust(5),
            if op in hasconst:
                if constants:
                    print '(' + repr(constants[oparg]) + ')',
                else:
                    print '(%d)'%oparg,
            elif op in hasname:
                if names is not None:
                    print '(' + names[oparg] + ')',
                else:
                    print '(%d)'%oparg,
            elif op in hasjrel:
                print '(to ' + repr(i + oparg) + ')',
            elif op in haslocal:
                if varnames:
                    print '(' + varnames[oparg] + ')',
                else:
                    print '(%d)' % oparg,
            elif op in hascompare:
                print '(' + cmp_op[oparg] + ')',
        print

disco = disassemble                     # XXX For backwards compatibility

'''遍歷尋找co_code中為跳轉操作的opcode,並將跳轉的目的地址(字節碼的偏
移地址)存入labels中'''
def findlabels(code):
    """Detect all offsets in a byte code which are jump targets.
    Return the list of offsets.
    """
    labels = []
    n = len(code)
    i = 0
    while i < n:
        c = code[i]
        op = ord(c)
        i = i+1
        if op >= HAVE_ARGUMENT:
            '''計算argv表示的偏移地址'''
            oparg = ord(code[i]) + ord(code[i+1])*256
            i = i+2
            label = -1
            '''根據跳轉類型將跳轉后的地址加入數組labels中'''
            if op in hasjrel:
                label = i+oparg
            elif op in hasjabs:
                label = oparg
            if label >= 0:
                if label not in labels:
                    labels.append(label)
    return labels

def findlinestarts(code):
    """Find the offsets in a byte code which are start of lines in the source.
    Generate pairs (offset, lineno) as described in Python/compile.c.
    """
    '''匯編偏移'''
    byte_increments = [ord(c) for c in code.co_lnotab[0::2]]
    '''源碼偏移'''
    line_increments = [ord(c) for c in code.co_lnotab[1::2]]

    '''上一行源碼的絕對地址'''
    lastlineno = None
    '''當前匯編對應源碼的行'''
    lineno = code.co_firstlineno
    addr = 0
    for byte_incr, line_incr in zip(byte_increments, line_increments):
        if byte_incr:
            if lineno != lastlineno:
                yield (addr, lineno)
                lastlineno = lineno
            addr += byte_incr
        lineno += line_incr
    '''byte偏移量一定每次遞增不為零,但是源碼可能出現lambda類似
    語句,因此不同區塊的字節碼可能對應於源碼的同一行'''
    if lineno != lastlineno:
        yield (addr, lineno)

def _test():
    """Simple test program to disassemble a file."""
    if sys.argv[1:]:
        if sys.argv[2:]:
            sys.stderr.write("usage: python dis.py [-|file]\n")
            sys.exit(2)
        fn = sys.argv[1]
        if not fn or fn == "-":
            fn = None
    else:
        fn = None
    if fn is None:
        f = sys.stdin
    else:
        f = open(fn)
    source = f.read()
    if fn is not None:
        f.close()
    else:
        fn = "<stdin>"
    code = compile(source, fn, "exec")
    dis(code)

if __name__ == "__main__":
    _test()


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM