Python代碼統計工具

標簽： Python 代碼統計

Python代碼統計工具
聲明
一. 問題提出
二. 代碼實現
三. 效果驗證

聲明

本文將對《Python實現C代碼統計工具(一)~(三)》中的C代碼統計工具進行擴展，以支持Python腳本自身的行數統計。

一. 問題提出

此前實現的C代碼統計工具僅能分析和統計C語言代碼文件，但其設計思想也適用於Python代碼及其他編碼語言。

Python行數統計的難點在於注釋行，因為Python有兩種注釋方式：簡單明了的單行注釋和復雜含糊的多行注釋(塊注釋)。單行注釋以#(pound或hash)符號起始，直至物理行的末尾(但字符串內的#並無注釋作用)。多行注釋可在每行頭部添加#號，也可包入未命名的三引號字符串(triple-quoted strings，即多行字符串)內。除非未命名三引號字符串作為對象的文檔字符串(docstring)，即模塊、類、或函數體的第一條語句為未命名字符串，否則可作為多行注釋。

下面以總27_代7_注15_空5.py腳本為例，演示不同的注釋方式。注意，該腳本僅作測試數據用，並非真實世界中的腳本文件。

#!/usr/bin/python
# -*- coding: utf-8 -*-

#comment3

print 'code1'
'''comment4
print """comment5"""
comment6'''

"""comment7
'''print 'comment8 and comment9'
"""
print 'code2'

def code3():
    """f = open('whatever', 'r')
    multiline comment 10,11,12 make up a doc string
    """
    print 'code4'
    '''
    print 'comment13, comment14 and comment15'
    '''
    return 'code5'

help(code3); print 'code6'
print code3.__doc__, 'code7'

運行該腳本后，輸出如下：

code1
code2
Help on function code3 in module __main__:

code3()
    f = open('whatever', 'r')
    multiline comment 10,11,12 make up a doc string

code6
f = open('whatever', 'r')
    multiline comment 10,11,12 make up a doc string
     code7

使用未命名三引號字符串做注釋時，存在如下缺點：

未命名字符串本質上並非注釋，而是不生成字節碼的語句。因此，需要滿足縮進要求(常錯點)。
無法注釋掉已包含相同三引號字符串的代碼。
IDE的語法高亮會將三引號字符串標記為字符串，而不是注釋區。
此外，大多數IDE均支持選擇代碼片段，並自動使用單行注釋符對選區添加注釋。以IDLE(Python GUI)為例，快捷鍵Alt+3可添加注釋，Alt+4可刪除注釋。因此，建議總是使用#號添加多行注釋，而三引號字符串僅用於調試過程中臨時性地注釋代碼塊。

二. 代碼實現

為同時支持統計C和Python代碼，需對CalcLines()和CountFileLines()函數稍作修改。其他函數實現參考C代碼統計工具前述系列文章。可以看出，絕大部分實現只需少量或無需修改，這表明前期的函數划分和布局得當。

為求直觀，將原先的CalcLines()函數重命名為CalcLinesCh()。接着，實現統計Python腳本行信息的CalcLinesPy()函數：

def CalcLinesPy(line, isBlockComment):
    #isBlockComment[single quotes, double quotes]
    lineType, lineLen = 0, len(line)
    line = line + '\n\n' #添加兩個字符防止iChar+2時越界
    iChar, isLineComment = 0, False
    while iChar < lineLen:
        #行結束符(Windows:\r\n; MacOS 9:\r; OS X&Unix:\n)
        #不可寫為"if line[iChar] in os.linesep"(文件可能來自異種系統)
        if line[iChar] == '\r' or line[iChar] == '\n':
            break
        elif line[iChar] == ' ' or line[iChar] == '\t':   #空白字符
            iChar += 1; continue
        elif line[iChar] == '#':            #行注釋
            isLineComment = True
            lineType |= 2
        elif line[iChar:iChar+3] == "'''":  #單引號塊注釋
            if isBlockComment[0] or isBlockComment[1]:
                isBlockComment[0] = False
            else:
                isBlockComment[0] = True
            lineType |= 2; iChar += 2
        elif line[iChar:iChar+3] == '"""':  #雙引號塊注釋
            if isBlockComment[0] or isBlockComment[1]:
                isBlockComment[1] = False
            else:
                isBlockComment[1] = True
            lineType |= 2; iChar += 2
        else:
            if isLineComment or isBlockComment[0] or isBlockComment[1]:
                lineType |= 2
            else:
                lineType |= 1
        iChar += 1

    return lineType   #Bitmap：0空行，1代碼，2注釋，3代碼和注釋

相應地，CountFileLines()函數作如下修改：

def CountFileLines(filePath, isRawReport=True, isShortName=False):
    fileExt = os.path.splitext(filePath)
    if fileExt[1] == '.c' or fileExt[1] == '.h':
        CalcLinesFunc = CalcLinesCh
    elif fileExt[1] == '.py':
        CalcLinesFunc = CalcLinesPy
    else:
        return

    isBlockComment = [False]*2  #或定義為全局變量，以保存上次值
    lineCountInfo = [0]*4       #[代碼總行數, 代碼行數, 注釋行數, 空白行數]
    with open(filePath, 'r') as file:
        for line in file:
            lineType = CalcLinesFunc(line, isBlockComment)
            lineCountInfo[0] += 1
            if   lineType == 0:  lineCountInfo[3] += 1
            elif lineType == 1:  lineCountInfo[1] += 1
            elif lineType == 2:  lineCountInfo[2] += 1
            elif lineType == 3:  lineCountInfo[1] += 1; lineCountInfo[2] += 1
            else:
                assert False, 'Unexpected lineType: %d(0~3)!' %lineType

    if isRawReport:
        global rawCountInfo
        rawCountInfo[:-1] = [x+y for x,y in zip(rawCountInfo[:-1], lineCountInfo)]
        rawCountInfo[-1] += 1
    elif isShortName:
        detailCountInfo.append([os.path.basename(filePath), lineCountInfo])
    else:
        detailCountInfo.append([filePath, lineCountInfo])

CountFileLines()函數根據后綴判斷文件類型並調用相應的統計函數，並擴展isBlockComment列表以存儲兩種Python塊注釋(三單引號和三雙引號)。除此之外，別無其他修改。

三. 效果驗證

將本文的統計實現命名為PCLineCounter.py。首先，混合統計C文件和Python腳本：

E:\PyTest>PCLineCounter.py -d lctest
FileLines  CodeLines  CommentLines  EmptyLines  CommentPercent  FileName
6          3          4             0           0.57            E:\PyTest\lctest\hard.c
33         19         15            4           0.44            E:\PyTest\lctest\line.c
243        162        26            60          0.14            E:\PyTest\lctest\subdir\CLineCounter.py
44         34         3             7           0.08            E:\PyTest\lctest\subdir\test.c
44         34         3             7           0.08            E:\PyTest\lctest\test.c
27         7          15            5           0.68            E:\PyTest\lctest\總27_代7_注15_空5.py
------------------------------------------------------------------------------------------
397        259        66            83          0.20            <Total:6 Code Files>

然后，統計純Python腳本，並通過cProfile命令分析性能：

E:\PyTest>python -m cProfile -s tottime PCLineCounter.py -d -b C:\Python27\Lib\encodings_trim > out.txt

截取out.txt文件部分內容如下：

FileLines  CodeLines  CommentLines  EmptyLines  CommentPercent  FileName
157        79         50            28          0.39            __init__.py
527        309        116           103         0.27            aliases.py
50         27         8             15          0.23            ascii.py
80         37         22            21          0.37            base64_codec.py
103        55         24            25          0.30            bz2_codec.py
69         38         10            21          0.21            charmap.py
307        285        262           16          0.48            cp1252.py
39         26         5             8           0.16            gb18030.py
39         26         5             8           0.16            gb2312.py
39         26         5             8           0.16            gbk.py
80         37         22            21          0.37            hex_codec.py
307        285        262           16          0.48            iso8859_1.py
50         27         8             15          0.23            latin_1.py
47         24         10            13          0.29            mbcs.py
83         58         36            17          0.38            palmos.py
175        148        127           18          0.46            ptcp154.py
238        183        28            30          0.13            punycode.py
76         45         15            16          0.25            quopri_codec.py
45         24         8             13          0.25            raw_unicode_escape.py
38         24         4             10          0.14            string_escape.py
49         26         9             14          0.26            undefined.py
45         24         8             13          0.25            unicode_escape.py
45         24         8             13          0.25            unicode_internal.py
126        93         10            23          0.10            utf_16.py
42         23         6             13          0.21            utf_16_be.py
42         23         6             13          0.21            utf_16_le.py
150        113        16            21          0.12            utf_32.py
37         23         5             9           0.18            utf_32_be.py
37         23         5             9           0.18            utf_32_le.py
42         23         6             13          0.21            utf_8.py
117        84         14            19          0.14            utf_8_sig.py
130        70         32            28          0.31            uu_codec.py
103        56         23            25          0.29            zlib_codec.py
------------------------------------------------------------------------------------------
3514       2368       1175          635         0.33            <Total:33 Code Files>
         10180 function calls (10146 primitive calls) in 0.168 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     3514    0.118    0.000    0.122    0.000 PCLineCounter.py:45(CalcLinesPy)
       56    0.015    0.000    0.144    0.003 PCLineCounter.py:82(CountFileLines)
       33    0.005    0.000    0.005    0.000 {open}
        1    0.004    0.004    0.005    0.005 collections.py:1(<module>)
4028/4020    0.004    0.000    0.004    0.000 {len}
       57    0.004    0.000    0.004    0.000 {nt._isdir}
      259    0.002    0.000    0.003    0.000 ntpath.py:96(splitdrive)
        1    0.002    0.002    0.007    0.007 argparse.py:62(<module>)
        1    0.002    0.002    0.168    0.168 PCLineCounter.py:6(<module>)

為避免制作單個exe時體積過大，作者拷貝Lib\encodings目錄后，刪除該目錄內不需要的語言文件並重命名為encodings_trim。

最后，需要指出的是，本文實現未區分三引號塊注釋與參與賦值或計算的字符串(如s='''devil'''或money=10 if '''need''' else 0)，也未處理單個單引號或雙引號括起的未命名字符串(如"I'm a bad comment")。畢竟，這些並非良好的Python編程風格。而且，實際應用中其實並不要求非常准確地統計代碼行和注釋行。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 python 練習（一）代碼統計工具的實現 Python實現C代碼統計工具(一) 代碼量統計工具代碼統計工具cloc 代碼量統計工具 Python實現代碼統計工具——終極加速篇 Python實現代碼行數統計工具工具-cloc代碼行數統計工具代碼統計工具實測點評源代碼行數統計工具及方法