性能測試的意義
在做完一個python項目之后,我們經常要考慮對軟件的性能進行優化。那么我們需要一個軟件優化的思路,首先我們需要明確軟件本身代碼以及函數的瓶頸,最理想的情況就是有這樣一個工具,能夠將一個目標函數的代碼每一行的性能都評估出來,這樣我們可以針對所有代碼中性能最差的那一部分,來進行針對性的優化。開源庫line_profiler就做了一個這樣的工作,開源地址:github.com/rkern/line_profiler。下面讓我們一起看下該工具的安裝和使用詳情。
line_profiler的安裝
line_profiler的安裝支持源碼安裝和pip的安裝,這里我們僅介紹pip形式的安裝,也比較容易,源碼安裝方式請參考官方開源地址。
[dechin@dechin-manjaro line_profiler]$ python3 -m pip install line_profiler
Collecting line_profiler
Downloading line_profiler-3.1.0-cp38-cp38-manylinux2010_x86_64.whl (65 kB)
|████████████████████████████████| 65 kB 221 kB/s
Requirement already satisfied: IPython in /home/dechin/anaconda3/lib/python3.8/site-packages (from line_profiler) (7.19.0)
Requirement already satisfied: prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0 in /home/dechin/anaconda3/lib/python3.8/site-packages (from IPython->line_profiler) (3.0.8)
Requirement already satisfied: backcall in /home/dechin/anaconda3/lib/python3.8/site-packages (from IPython->line_profiler) (0.2.0)
Requirement already satisfied: pexpect>4.3; sys_platform != "win32" in /home/dechin/anaconda3/lib/python3.8/site-packages (from IPython->line_profiler) (4.8.0)
Requirement already satisfied: setuptools>=18.5 in /home/dechin/anaconda3/lib/python3.8/site-packages (from IPython->line_profiler) (50.3.1.post20201107)
Requirement already satisfied: jedi>=0.10 in /home/dechin/anaconda3/lib/python3.8/site-packages (from IPython->line_profiler) (0.17.1)
Requirement already satisfied: decorator in /home/dechin/anaconda3/lib/python3.8/site-packages (from IPython->line_profiler) (4.4.2)
Requirement already satisfied: traitlets>=4.2 in /home/dechin/anaconda3/lib/python3.8/site-packages (from IPython->line_profiler) (5.0.5)
Requirement already satisfied: pygments in /home/dechin/anaconda3/lib/python3.8/site-packages (from IPython->line_profiler) (2.7.2)
Requirement already satisfied: pickleshare in /home/dechin/anaconda3/lib/python3.8/site-packages (from IPython->line_profiler) (0.7.5)
Requirement already satisfied: wcwidth in /home/dechin/anaconda3/lib/python3.8/site-packages (from prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0->IPython->line_profiler) (0.2.5)
Requirement already satisfied: ptyprocess>=0.5 in /home/dechin/anaconda3/lib/python3.8/site-packages (from pexpect>4.3; sys_platform != "win32"->IPython->line_profiler) (0.6.0)
Requirement already satisfied: parso<0.8.0,>=0.7.0 in /home/dechin/anaconda3/lib/python3.8/site-packages (from jedi>=0.10->IPython->line_profiler) (0.7.0)
Requirement already satisfied: ipython-genutils in /home/dechin/anaconda3/lib/python3.8/site-packages (from traitlets>=4.2->IPython->line_profiler) (0.2.0)
Installing collected packages: line-profiler
Successfully installed line-profiler-3.1.0
這里額外介紹一種臨時使用pip的源進行安裝的方案,這里用到的是騰訊所提供的pypi源:
python3 -m pip install -i https://mirrors.cloud.tencent.com/pypi/simple line_profiler
如果需要永久保存源可以修改~/.pip/pip.conf文件,一個參考示例如下(采用華為雲的鏡像源):
[global]
index-url = https://mirrors.huaweicloud.com/repository/pypi/simple
trusted-host = mirrors.huaweicloud.com
timeout = 120
在需要調試優化的代碼中引用line_profiler
讓我們直接來看一個案例:
# line_profiler_test.py
from line_profiler import LineProfiler
import numpy as np
@profile
def test_profiler():
for i in range(100):
a = np.random.randn(100)
b = np.random.randn(1000)
c = np.random.randn(10000)
return None
if __name__ == '__main__':
test_profiler()
在這個案例中,我們定義了一個需要測試的函數test_profiler,在這個函數中有幾行待分析性能的模塊numpy.random.randn。使用的方式就是先import進來LineProfiler函數,然后在需要逐行進行性能分析的函數上方引用名為profile的裝飾器,就完成了line_profiler性能分析的配置。關於python裝飾器的使用和原理,可以參考這篇博客的內容介紹。還有一點需要注意的是,line_profiler所能夠分析的范圍僅限於加了裝飾器的函數內容,如果函數內有其他的調用之類的,不會再進入其他的函數進行分析,除了內嵌的嵌套函數。
使用line_profiler進行簡單性能分析
line_profiler的使用方法也較為簡單,主要就是兩步:先用kernprof解析,再采用python執行得到分析結果。
- 在定義好需要分析的函數模塊之后,用
kernprof解析成二進制lprof文件:
[dechin-manjaro line_profiler]# kernprof -l line_profiler_test.py
Wrote profile results to line_profiler_test.py.lprof
該命令執行結束后,會在當前目錄下產生一個lprof文件:
[dechin-manjaro line_profiler]# ll
總用量 8
-rw-r--r-- 1 dechin dechin 304 1月 20 16:00 line_profiler_test.py
-rw-r--r-- 1 root root 185 1月 20 16:00 line_profiler_test.py.lprof
- 使用
python3運行lprof二進制文件:
[dechin-manjaro line_profiler]# python3 -m line_profiler line_profiler_test.py.lprof
Timer unit: 1e-06 s
Total time: 0.022633 s
File: line_profiler_test.py
Function: test_profiler at line 5
Line # Hits Time Per Hit % Time Line Contents
==============================================================
5 @profile
6 def test_profiler():
7 101 40.0 0.4 0.2 for i in range(100):
8 100 332.0 3.3 1.5 a = np.random.randn(100)
9 100 2092.0 20.9 9.2 b = np.random.randn(1000)
10 100 20169.0 201.7 89.1 c = np.random.randn(10000)
11 1 0.0 0.0 0.0 return None
這里我們就直接得到了逐行的性能分析結論。簡單介紹一下每一列的含義:代碼在代碼文件中對應的行號、被調用的次數、該行的總共執行時間、單次執行所消耗的時間、執行時間在該函數下的占比,最后一列是具體的代碼內容。其實,關於line_profiler的使用介紹到這里就可以結束了,但是我們希望通過另外一個實際案例來分析line_profiler的功能,感興趣的讀者可以繼續往下閱讀。
使用line_profiler分析不同函數庫計算正弦函數sin的效率
我們這里需要測試多個庫中所實現的正弦函數,其中包含我們自己使用的fortran內置的SIN函數。
在演示line_profiler的性能測試之前,讓我們先看看如何將一個fortran的f90文件轉換成python可調用的動態鏈接庫文件。
- 首先在Manjaro Linux平台上安裝gfotran
[dechin-manjaro line_profiler]# pacman -S gcc-fortran
正在解析依賴關系...
正在查找軟件包沖突...
軟件包 (1) gcc-fortran-10.2.0-4
下載大小: 9.44 MiB
全部安裝大小: 31.01 MiB
:: 進行安裝嗎? [Y/n] Y
:: 正在獲取軟件包......
gcc-fortran-10.2.0-4-x86_64 9.4 MiB 6.70 MiB/s 00:01 [#######################################################################################] 100%
(1/1) 正在檢查密鑰環里的密鑰 [#######################################################################################] 100%
(1/1) 正在檢查軟件包完整性 [#######################################################################################] 100%
(1/1) 正在加載軟件包文件 [#######################################################################################] 100%
(1/1) 正在檢查文件沖突 [#######################################################################################] 100%
(1/1) 正在檢查可用存儲空間 [#######################################################################################] 100%
:: 正在處理軟件包的變化...
(1/1) 正在安裝 gcc-fortran [#######################################################################################] 100%
:: 正在運行事務后鈎子函數...
(1/2) Arming ConditionNeedsUpdate...
(2/2) Updating the info directory file...
- 創建一個簡單的fortran文件
fmath.f90,功能為返回正弦函數的值:
subroutine fsin(theta,result)
implicit none
real*8::theta
real*8,intent(out)::result
result=SIN(theta)
end subroutine
- 用f2py將該fortran文件編譯成名為
fmath的動態鏈接庫:
[dechin-manjaro line_profiler]# f2py -c -m fmath fmath.f90
running build
running config_cc
unifing config_cc, config, build_clib, build_ext, build commands --compiler options
running config_fc
unifing config_fc, config, build_clib, build_ext, build commands --fcompiler options
running build_src
build_src
building extension "fmath" sources
f2py options: []
f2py:> /tmp/tmpup5ia9lf/src.linux-x86_64-3.8/fmathmodule.c
creating /tmp/tmpup5ia9lf/src.linux-x86_64-3.8
Reading fortran codes...
Reading file 'fmath.f90' (format:free)
Post-processing...
Block: fmath
Block: fsin
Post-processing (stage 2)...
Building modules...
Building module "fmath"...
Constructing wrapper function "fsin"...
result = fsin(theta)
Wrote C/API module "fmath" to file "/tmp/tmpup5ia9lf/src.linux-x86_64-3.8/fmathmodule.c"
adding '/tmp/tmpup5ia9lf/src.linux-x86_64-3.8/fortranobject.c' to sources.
adding '/tmp/tmpup5ia9lf/src.linux-x86_64-3.8' to include_dirs.
copying /home/dechin/anaconda3/lib/python3.8/site-packages/numpy/f2py/src/fortranobject.c -> /tmp/tmpup5ia9lf/src.linux-x86_64-3.8
copying /home/dechin/anaconda3/lib/python3.8/site-packages/numpy/f2py/src/fortranobject.h -> /tmp/tmpup5ia9lf/src.linux-x86_64-3.8
build_src: building npy-pkg config files
running build_ext
customize UnixCCompiler
customize UnixCCompiler using build_ext
get_default_fcompiler: matching types: '['gnu95', 'intel', 'lahey', 'pg', 'absoft', 'nag', 'vast', 'compaq', 'intele', 'intelem', 'gnu', 'g95', 'pathf95', 'nagfor']'
customize Gnu95FCompiler
Found executable /usr/bin/gfortran
customize Gnu95FCompiler
customize Gnu95FCompiler using build_ext
building 'fmath' extension
compiling C sources
C compiler: gcc -pthread -B /home/dechin/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC
creating /tmp/tmpup5ia9lf/tmp
creating /tmp/tmpup5ia9lf/tmp/tmpup5ia9lf
creating /tmp/tmpup5ia9lf/tmp/tmpup5ia9lf/src.linux-x86_64-3.8
compile options: '-I/tmp/tmpup5ia9lf/src.linux-x86_64-3.8 -I/home/dechin/anaconda3/lib/python3.8/site-packages/numpy/core/include -I/home/dechin/anaconda3/include/python3.8 -c'
gcc: /tmp/tmpup5ia9lf/src.linux-x86_64-3.8/fmathmodule.c
gcc: /tmp/tmpup5ia9lf/src.linux-x86_64-3.8/fortranobject.c
In file included from /home/dechin/anaconda3/lib/python3.8/site-packages/numpy/core/include/numpy/ndarraytypes.h:1822,
from /home/dechin/anaconda3/lib/python3.8/site-packages/numpy/core/include/numpy/ndarrayobject.h:12,
from /home/dechin/anaconda3/lib/python3.8/site-packages/numpy/core/include/numpy/arrayobject.h:4,
from /tmp/tmpup5ia9lf/src.linux-x86_64-3.8/fortranobject.h:13,
from /tmp/tmpup5ia9lf/src.linux-x86_64-3.8/fmathmodule.c:15:
/home/dechin/anaconda3/lib/python3.8/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:17:2: 警告:#warning "Using deprecated NumPy API, disable it with " "#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
17 | #warning "Using deprecated NumPy API, disable it with " \
| ^~~~~~~
In file included from /home/dechin/anaconda3/lib/python3.8/site-packages/numpy/core/include/numpy/ndarraytypes.h:1822,
from /home/dechin/anaconda3/lib/python3.8/site-packages/numpy/core/include/numpy/ndarrayobject.h:12,
from /home/dechin/anaconda3/lib/python3.8/site-packages/numpy/core/include/numpy/arrayobject.h:4,
from /tmp/tmpup5ia9lf/src.linux-x86_64-3.8/fortranobject.h:13,
from /tmp/tmpup5ia9lf/src.linux-x86_64-3.8/fortranobject.c:2:
/home/dechin/anaconda3/lib/python3.8/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:17:2: 警告:#warning "Using deprecated NumPy API, disable it with " "#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
17 | #warning "Using deprecated NumPy API, disable it with " \
| ^~~~~~~
compiling Fortran sources
Fortran f77 compiler: /usr/bin/gfortran -Wall -g -ffixed-form -fno-second-underscore -fPIC -O3 -funroll-loops
Fortran f90 compiler: /usr/bin/gfortran -Wall -g -fno-second-underscore -fPIC -O3 -funroll-loops
Fortran fix compiler: /usr/bin/gfortran -Wall -g -ffixed-form -fno-second-underscore -Wall -g -fno-second-underscore -fPIC -O3 -funroll-loops
compile options: '-I/tmp/tmpup5ia9lf/src.linux-x86_64-3.8 -I/home/dechin/anaconda3/lib/python3.8/site-packages/numpy/core/include -I/home/dechin/anaconda3/include/python3.8 -c'
gfortran:f90: fmath.f90
/usr/bin/gfortran -Wall -g -Wall -g -shared /tmp/tmpup5ia9lf/tmp/tmpup5ia9lf/src.linux-x86_64-3.8/fmathmodule.o /tmp/tmpup5ia9lf/tmp/tmpup5ia9lf/src.linux-x86_64-3.8/fortranobject.o /tmp/tmpup5ia9lf/fmath.o -L/usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../../../lib -L/usr/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../../../lib -lgfortran -o ./fmath.cpython-38-x86_64-linux-gnu.so
Removing build directory /tmp/tmpup5ia9lf
這中間會有一些告警,但是並不影響我們的正常使用,編譯好之后,可以在當前目錄下看到一個so文件(如果是windows平台可能是其他類型的動態鏈接庫文件):
[dechin-manjaro line_profiler]# ll
總用量 120
-rwxr-xr-x 1 root root 107256 1月 20 16:40 fmath.cpython-38-x86_64-linux-gnu.so
-rw-r--r-- 1 root root 150 1月 20 16:40 fmath.f90
-rw-r--r-- 1 dechin dechin 304 1月 20 16:00 line_profiler_test.py
-rw-r--r-- 1 root root 185 1月 20 16:00 line_profiler_test.py.lprof
- 用ipython測試該動態鏈接庫的功能是否正常:
[dechin-manjaro line_profiler]# ipython
Python 3.8.5 (default, Sep 4 2020, 07:30:14)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.19.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: from fmath import fsin
In [2]: print (fsin(3.14))
0.0015926529164868282
In [3]: print (fsin(3.1415926))
5.3589793170057245e-08
這里我們可以看到基於fortran的正弦函數的功能已經完成實現了,接下來讓我們正式對比幾種正弦函數實現的性能(底層的實現有可能重復,這里作為黑盒來進行性能測試)。
首先,我們還是需要創建好待測試的python文件sin_profiler_test.py:
# sin_profiler_test.py
from line_profiler import LineProfiler
import random
from numpy import sin as numpy_sin
from math import sin as math_sin
# from cupy import sin as cupy_sin
from cmath import sin as cmath_sin
from fmath import fsin as fortran_sin
@profile
def test_profiler():
for i in range(100000):
r = random.random()
a = numpy_sin(r)
b = math_sin(r)
# c = cupy_sin(r)
d = cmath_sin(r)
e = fortran_sin(r)
return None
if __name__ == '__main__':
test_profiler()
這里line_profiler的定義跟前面定義的例子一致,我們主要測試的對象為numpy,math,cmath四個開源庫的正弦函數實現以及自己實現的一個fortran的正弦函數,通過上面介紹的f2py構造的動態鏈接庫跟python實現無縫對接。由於這里的cupy庫沒有安裝成功,所以這里暫時沒辦法測試而注釋掉了。接下來還是一樣的,通過kernprof進行編譯構建:
[dechin-manjaro line_profiler]# kernprof -l sin_profiler_test.py
Wrote profile results to sin_profiler_test.py.lprof
最后通過python3來執行:
[dechin-manjaro line_profiler]# python3 -m line_profiler sin_profiler_test.py.lprof
Timer unit: 1e-06 s
Total time: 0.261304 s
File: sin_profiler_test.py
Function: test_profiler at line 10
Line # Hits Time Per Hit % Time Line Contents
==============================================================
10 @profile
11 def test_profiler():
12 100001 28032.0 0.3 10.7 for i in range(100000):
13 100000 33995.0 0.3 13.0 r = random.random()
14 100000 86870.0 0.9 33.2 a = numpy_sin(r)
15 100000 33374.0 0.3 12.8 b = math_sin(r)
16 # c = cupy_sin(r)
17 100000 40179.0 0.4 15.4 d = cmath_sin(r)
18 100000 38854.0 0.4 14.9 e = fortran_sin(r)
19 1 0.0 0.0 0.0 return None
從這個結果上我們可以看出,在這測試的四個庫中,math的計算效率是最高的,numpy的計算效率是最低的,而我們自己編寫的fortran接口函數甚至都比numpy的實現快了一倍,僅次於math的實現。其實,這里值涉及到了單個函數的性能測試,我們還可以通過ipython中自帶的timeit來進行測試:
[dechin-manjaro line_profiler]# ipython
Python 3.8.5 (default, Sep 4 2020, 07:30:14)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.19.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: from fmath import fsin
In [2]: import random
In [3]: %timeit fsin(random.random())
145 ns ± 2.38 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
In [4]: from math import sin as math_sin
In [5]: %timeit math_sin(random.random())
107 ns ± 0.116 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
In [6]: from numpy import sin as numpy_sin
In [7]: %timeit numpy_sin(random.random())
611 ns ± 4.28 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [8]: from cmath import sin as cmath_sin
In [9]: %timeit cmath_sin(random.random())
151 ns ± 1.01 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
在這個結果中我們看到排名的趨勢依然跟之前的保持一致,但是由於將random模塊和計算模塊放在一起,在給出的時間數值上有些差異。
總結概要
本文重點介紹了python的一款逐行性能分析的工具line_profiler,通過簡單的裝飾器的調用就可以分析出程序的性能瓶頸,從而進行針對性的優化。另外,在測試的過程中我們還可以發現,不同形式的正弦三角函數實現,性能是存在差異的,只是在日常使用頻率較低的情況下是不感知的。需要了解的是,即使是正弦函數也有很多不同的實現方案,比如各種級數展開,而目前最流行、性能最高的計算方式,其實還是通過查表法。因此,不同的算法實現、不同的語言實現,都會導致完全不一樣的結果。就測試情況而言,已知的性能排名為:math<fortran<cmath<numpy從左到右運行時長逐步增加。
版權聲明
本文首發鏈接為:https://www.cnblogs.com/dechinphy/p/line-profiler.html
作者ID:DechinPhy
更多原著文章請參考:https://www.cnblogs.com/dechinphy/
