Cython+Numpy的運算加速（官方Demo）測試

本文轉載自查看原文 2020-07-29 21:47 1065 [T05] Python/ Cython

http://docs.cython.org/en/latest/src/tutorial/numpy.html

Cython與NumPy的工作

注意

Cython 0.16引入了類型化的內存視圖，作為此處描述的NumPy集成的繼承者。它們比下面的緩沖區語法更易於使用，開銷較小，並且可以在不需要GIL的情況下進行傳遞。應優先使用此頁面中顯示的語法。有關NumPy用戶的信息，請參見Cython。

您可以從Cython中使用NumPy，與在常規Python中完全一樣，但是這樣做會丟失潛在的高速度，因為Cython支持快速訪問NumPy數組。讓我們用一個簡單的例子看看它是如何工作的。

下面的代碼使用濾鏡對圖像進行2D離散卷積（我敢肯定，您可以做得更好！讓它用於演示）。它既是有效的Python，又是有效的Cython代碼。convolve_py.py對於Python版本和convolve1.pyxCython版本，我都將其稱為 – Cython使用“ .pyx”作為其文件后綴。

 
          import numpy as np def naive_convolve(f, g): # f is an image and is indexed by (v, w) # g is a filter kernel and is indexed by (s, t), # it needs odd dimensions # h is the output image and is indexed by (x, y), # it is not cropped if g.shape[0] % 2 != 1 or g.shape[1] % 2 != 1: raise ValueError("Only odd dimensions on filter supported") # smid and tmid are number of pixels between the center pixel # and the edge, ie for a 5x5 filter they will be 2. # # The output size is calculated by adding smid, tmid to each # side of the dimensions of the input image. vmax = f.shape[0] wmax = f.shape[1] smax = g.shape[0] tmax = g.shape[1] smid = smax // 2 tmid = tmax // 2 xmax = vmax + 2 * smid ymax = wmax + 2 * tmid # Allocate result image. h = np.zeros([xmax, ymax], dtype=f.dtype) # Do convolution for x in range(xmax): for y in range(ymax): # Calculate pixel value for h at (x,y). Sum one component # for each pixel (s, t) of the filter g. s_from = max(smid - x, -smid) s_to = min((xmax - x) - smid, smid + 1) t_from = max(tmid - y, -tmid) t_to = min((ymax - y) - tmid, tmid + 1) value = 0 for s in range(s_from, s_to): for t in range(t_from, t_to): v = x - smid + s w = y - tmid + t value += g[smid - s, tmid - t] * f[v, w] h[x, y] = value return h  
         

應該對此進行編譯以產生yourmod.so（對於Linux系統，在Windows系統上，它將是yourmod.pyd）。我們運行一個Python會話來測試Python版本（從.py-file 導入）和已編譯的Cython模塊。

 
          In [1]: import numpy as np In [2]: import convolve_py In [3]: convolve_py.naive_convolve(np.array([[1, 1, 1]], dtype=np.int), ... np.array([[1],[2],[1]], dtype=np.int)) Out [3]: array([[1, 1, 1],  [2, 2, 2],  [1, 1, 1]]) In [4]: import convolve1 In [4]: convolve1.naive_convolve(np.array([[1, 1, 1]], dtype=np.int), ... np.array([[1],[2],[1]], dtype=np.int)) Out [4]: array([[1, 1, 1],  [2, 2, 2],  [1, 1, 1]]) In [11]: N = 100 In [12]: f = np.arange(N*N, dtype=np.int).reshape((N,N)) In [13]: g = np.arange(81, dtype=np.int).reshape((9, 9)) In [19]: %timeit -n2 -r3 convolve_py.naive_convolve(f, g) 2 loops, best of 3: 1.86 s per loop In [20]: %timeit -n2 -r3 convolve1.naive_convolve(f, g) 2 loops, best of 3: 1.41 s per loop  
         

還沒有太大的區別。因為C代碼仍然完全執行Python解釋器的操作（例如，這意味着為每個使用的數字分配一個新對象）。查看生成的html文件，看看即使最簡單的語句也需要什么，您很快就會明白這一點。我們需要為Cython提供更多信息；我們需要添加類型。

添加類型

要添加類型，我們使用自定義的Cython語法，因此我們現在破壞了Python源兼容性。考慮以下代碼（請閱讀注釋！）：

 
           # tag: numpy
# You can ignore the previous line. # It's for internal testing of the cython documentation. import numpy as np # "cimport" is used to import special compile-time information # about the numpy module (this is stored in a file numpy.pxd which is # currently part of the Cython distribution). cimport numpy as np # We now need to fix a datatype for our arrays. I've used the variable # DTYPE for this, which is assigned to the usual NumPy runtime # type info object. DTYPE = np.int # "ctypedef" assigns a corresponding compile-time type to DTYPE_t. For # every type in the numpy module there's a corresponding compile-time # type with a _t-suffix. ctypedef np.int_t DTYPE_t # "def" can type its arguments but not have a return type. The type of the # arguments for a "def" function is checked at run-time when entering the # function. # # The arrays f, g and h is typed as "np.ndarray" instances. The only effect # this has is to a) insert checks that the function arguments really are # NumPy arrays, and b) make some attribute access like f.shape[0] much # more efficient. (In this example this doesn't matter though.) def naive_convolve(np.ndarray f, np.ndarray g): if g.shape[0] % 2 != 1 or g.shape[1] % 2 != 1: raise ValueError("Only odd dimensions on filter supported") assert f.dtype == DTYPE and g.dtype == DTYPE # The "cdef" keyword is also used within functions to type variables. It # can only be used at the top indentation level (there are non-trivial # problems with allowing them in other places, though we'd love to see # good and thought out proposals for it). # # For the indices, the "int" type is used. This corresponds to a C int, # other C types (like "unsigned int") could have been used instead. # Purists could use "Py_ssize_t" which is the proper Python type for # array indices. cdef int vmax = f.shape[0] cdef int wmax = f.shape[1] cdef int smax = g.shape[0] cdef int tmax = g.shape[1] cdef int smid = smax // 2 cdef int tmid = tmax // 2 cdef int xmax = vmax + 2 * smid cdef int ymax = wmax + 2 * tmid cdef np.ndarray h = np.zeros([xmax, ymax], dtype=DTYPE) cdef int x, y, s, t, v, w # It is very important to type ALL your variables. You do not get any # warnings if not, only much slower code (they are implicitly typed as # Python objects). cdef int s_from, s_to, t_from, t_to # For the value variable, we want to use the same data type as is # stored in the array, so we use "DTYPE_t" as defined above. # NB! An important side-effect of this is that if "value" overflows its # datatype size, it will simply wrap around like in C, rather than raise # an error like in Python. cdef DTYPE_t value for x in range(xmax): for y in range(ymax): s_from = max(smid - x, -smid) s_to = min((xmax - x) - smid, smid + 1) t_from = max(tmid - y, -tmid) t_to = min((ymax - y) - tmid, tmid + 1) value = 0 for s in range(s_from, s_to): for t in range(t_from, t_to): v = x - smid + s w = y - tmid + t value += g[smid - s, tmid - t] * f[v, w] h[x, y] = value return h

建立此基礎並繼續執行我的（非常非正式的）基准測試后，我得到：

 
           In [21]: import convolve2 In [22]: %timeit -n2 -r3 convolve2.naive_convolve(f, g) 2 loops, best of 3: 828 ms per loop  
          

高效索引

仍然存在瓶頸，導致性能下降，那就是陣列查找和分配。本[]-運算符仍然使用完整的Python操作-也就是我們想要做的反而是訪問數據直接在C速度緩沖。

然后，我們需要輸入ndarray對象的內容。我們使用特殊的“緩沖區”語法來做到這一點，必須告知數據類型（第一個參數）和維數（“ ndim”僅關鍵字參數，如果未提供，則假定為一維）。

這些是所需的更改：

 
           ...
def naive_convolve(np.ndarray[DTYPE_t, ndim=2] f, np.ndarray[DTYPE_t, ndim=2] g): ... cdef np.ndarray[DTYPE_t, ndim=2] h = ...

用法：

 
           In [18]: import convolve3 In [19]: %timeit -n3 -r100 convolve3.naive_convolve(f, g) 3 loops, best of 100: 11.6 ms per loop  
          

請注意此更改的重要性。

陷阱：這種高效的索引編制僅影響某些索引操作，即具有確切ndim數量的類型化整數索引的索引操作。因此v，例如，如果未鍵入，則查詢不會得到優化。另一方面，這意味着您可以繼續使用Python對象進行復雜的動態切片等，就像未鍵入數組時一樣。f[v, w]

進一步優化索引

陣列查找仍然因兩個因素而變慢：

進行邊界檢查。

負索引已檢查並正確處理。上面的代碼經過顯式編碼，因此它不使用負索引，並且（希望）始終在范圍內進行訪問。我們可以添加一個裝飾器來禁用邊界檢查：

 
             ...
cimport cython @cython.boundscheck(False) # turn off bounds-checking for entire function @cython.wraparound(False) # turn off negative index wrapping for entire function def naive_convolve(np.ndarray[DTYPE_t, ndim=2] f, np.ndarray[DTYPE_t, ndim=2] g): ...

現在不執行邊界檢查（並且，副作用是，如果您“做”碰巧超出了邊界，則在最佳情況下將使程序崩潰，在最壞的情況下將破壞數據）。可以通過多種方式切換邊界檢查模式，有關更多信息，請參見Compiler指令。

此外，我們已禁用檢查以包裝負索引（例如，g [-1]給出最后一個值）。與禁用邊界檢查一樣，如果我們嘗試在禁用此功能的情況下實際使用負索引，則會發生不好的事情。

現在，函數調用開銷開始起作用，因此我們將后兩個示例與較大的N進行比較：

 
           In [11]: %timeit -n3 -r100 convolve4.naive_convolve(f, g) 3 loops, best of 100: 5.97 ms per loop In [12]: N = 1000 In [13]: f = np.arange(N*N, dtype=np.int).reshape((N,N)) In [14]: g = np.arange(81, dtype=np.int).reshape((9, 9)) In [17]: %timeit -n1 -r10 convolve3.naive_convolve(f, g) 1 loops, best of 10: 1.16 s per loop In [18]: %timeit -n1 -r10 convolve4.naive_convolve(f, g) 1 loops, best of 10: 597 ms per loop  
          

（這也是混合基准，因為結果數組在函數調用中分配。）

警告

速度要付出一些代價。尤其是它可能是危險的設置類型的對象（如f，g並h在我們的示例代碼） None。將此類對象設置None為完全合法，但是您只能使用它們檢查是否為無。所有其他用途（屬性查找或索引編制）都可能會造成段錯誤或數據損壞（而不是像Python中那樣引發異常）。

實際的規則稍微復雜一些，但是主要的信息很明確：不要在不知道類型對象未設置為None的情況下使用它們。

更通用的代碼

可以這樣做：

 
           def naive_convolve(object[DTYPE_t, ndim=2] f, ...):  
          

即使用object而不是np.ndarray。在Python 3.0中，這可以允許您的算法與支持緩沖區接口的任何庫一起使用；如果有人也對Python 2.x感興趣，可以輕松添加對Python圖像庫的支持，例如。

但是，這樣做會有一些速度上的損失（如果將類型設置為np.ndarray，則會有更多的假設是編譯時的，特別是假設數據是以純跨步模式存儲的，而不是以間接模式存儲的）

----

但是通過我的代碼的實驗，在單次運算量不大，並且需要多次新建釋放數據的時候，加速效果並沒有體現出來：

- list的運算效率大於np.array

- python 和cython的運行效率，在代碼一致的情況下，沒有提升

- 反正上面的走了一遭，性能提升0；（不適合，留待后續研究）

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 用Cython加速Python程序以及包裝C程序簡單測試用Cython加速Python到“起飛” Cython入門Demo（Linux）《Cython系列》7. Cython、numpy、以及類型化memoryview Cublas矩陣加速運算 Numpy 基本除法運算和模運算關於官方Reachability Demo理解 ILRuntime官方Demo筆記 numpy 矩陣運算 Numpy數組的基本運算操作