Python使用Zero-Copy和Buffer Protocol實現高性能編程

本文轉載自查看原文 2019-01-24 15:34 1058 High performance/ Zero Copy/ Buffer Protocol/ Python

無論你程序是做什么的，它經常都需要處理大量的數據。這些數據大部分表現形式為strings（字符串）。然而，當你對字符串大批量的拷貝，切片和修改操作時是相當低效的。為什么？

讓我們假設一個讀取二進制數據的大文件示例，然后將部分數據拷貝到另外一個文件。要展示該程序所使用的內存，我們使用memory_profiler，一個強大的Python包，讓我們可以一行一行觀察程序所使用的內存。

@profile
def read_random():
    with open("/dev/urandom", "rb") as source:
        content = source.read(1024 * 10000)
        content_to_write = content[1024:]
    print(f"content length: {len(content)}, content to write length {len(content_to_write)}")
    with open("/dev/null", "wb") as target:
        target.write(content_to_write)


if __name__ == "__main__":
    read_random()

使用memory_profiler模塊來執行以上程序，輸出如下：

$ python -m memory_profiler example.py 
content length: 10240000, content to write length 10238976
Filename: example.py

Line #    Mem usage    Increment   Line Contents
================================================
     1   14.320 MiB   14.320 MiB   @profile
     2                             def read_random():
     3   14.320 MiB    0.000 MiB       with open("/dev/urandom", "rb") as source:
     4   24.117 MiB    9.797 MiB           content = source.read(1024 * 10000)
     5   33.914 MiB    9.797 MiB           content_to_write = content[1024:]
     6   33.914 MiB    0.000 MiB       print(f"content length: {len(content)}, content to write length {len(content_to_write)}")
     7   33.914 MiB    0.000 MiB       with open("/dev/null", "wb") as target:
     8   33.914 MiB    0.000 MiB           target.write(content_to_write)

我們通過source.read從/dev/unrandom加載了10 MB數據。Python需要大概需要分配10 MB內存來以字符串存儲這個數據。之后的content[1024:]指令越過開頭的一個單位的KB數據進行數據拷貝，也分配了大概10 MB。

這里有趣的是在哪里呢，也就是構建content_to_write時10 MB的程序內存增長。切片操作拷貝了除了開頭的一個單位的KB其他所有的數據到一個新的字符串對象。

如果處理類似大量的字節數組對象操作那是簡直就是災難。如果你之前寫過C語言，在使用memcpy()需要注意點是：在內存使用以及總體性能來說，復制內存很慢。

然而，作為C程序員的你，知道字符串其實就是由字符數組構成，你不非得通過拷貝也能只處理部分字符，通過使用基本的指針運算——只需要確保整個字符串是連續的內存區域。

在Python同樣提供了buffer protocol實現。buffer protocol定義在PEP 3118，描述了使用C語言API實現各種類型的支持，例如字符串。

當一個對象實現了該協議，你就可以使用memoryview類構造一個memoryview對象引用原始內存對象。

>>> s = b"abcdefgh"
>>> view = memoryview(s)
>>> view[1]
98
>>> limited = view[1:3]
>>> limited
<memory at 0x7f6ff2df1108>
>>> bytes(view[1:3])
b'bc'

注意：98是字符b的ACSII碼

在上面的例子中，在使用memoryview對象的切片操作，同樣返回一個memoryview對象。意味着它並沒有拷貝任何數據，而是通過引用部分數據實現的。

下面圖示解釋發生了什么：

alt

因此，我們可以將之前的程序改造得更加高效。我們需要使用memoryview對象來引用數據，而不是開辟一個新的字符串。

@profile
def read_random():
    with open("/dev/urandom", "rb") as source:
        content = source.read(1024 * 10000)
        content_to_write = memoryview(content)[1024:]
    print(f"content length: {len(content)}, content to write length {len(content_to_write)}")
    with open("/dev/null", "wb") as target:
        target.write(content_to_write)


if __name__ == "__main__":
    read_random()

我們再一次使用memory profiler執行上面程序：

$ python -m memory_profiler example.py 
content length: 10240000, content to write length 10238976
Filename: example.py

Line #    Mem usage    Increment   Line Contents
================================================
     1   14.219 MiB   14.219 MiB   @profile
     2                             def read_random():
     3   14.219 MiB    0.000 MiB       with open("/dev/urandom", "rb") as source:
     4   24.016 MiB    9.797 MiB           content = source.read(1024 * 10000)
     5   24.016 MiB    0.000 MiB           content_to_write = memoryview(content)[1024:]
     6   24.016 MiB    0.000 MiB       print(f"content length: {len(content)}, content to write length {len(content_to_write)}")
     7   24.016 MiB    0.000 MiB       with open("/dev/null", "wb") as target:
     8   24.016 MiB    0.000 MiB           target.write(content_to_write)

在該程序中，source.read仍然分配了10 MB內存來讀取文件內容。然而，使用memoryview來引用部分內容時，並沒有額外在分配內存。

相比之前的版本，這里節省了大概50%的內存開銷。

該技巧，在處理sockets通信的時候極其有用。當通過socket發送數據時，所有的數據可能並沒有在一次調用就發送。

import socket
s = socket.socket(…)
s.connect(…)
# Build a bytes object with more than 100 millions times the letter `a`
data = b"a" * (1024 * 100000)
while data:
    sent = s.send(data)
    # Remove the first `sent` bytes sent
    data = data[sent:] <2>

使用如下實現，程序一次次拷貝直到所有的數據發出。通過使用memoryview，可以實現zero-copy（零拷貝）方式來完成該工作，具有更高的性能：

import socket
s = socket.socket(…)
s.connect(…)
# Build a bytes object with more than 100 millions times the letter `a`
data = b"a" * (1024 * 100000)
mv = memoryview(data)
while mv:
    sent = s.send(mv)
    # Build a new memoryview object pointing to the data which remains to be sent
    mv = mv[sent:]

在這里就不會發生任何拷貝，也不會在給data分配了100 MB內存之后再分配多余的內存來進行多次發送了。

目前，我們通過使用memoryview對象實現高效數據寫入，但在某些情況下讀取也同樣適用。在Python中大部分 I/O 操作已經實現了buffer protocol機制。在本例中，我們並不需要memoryview對象，我可以請求 I/O 函數寫入我們預定義好的對象：

>>> ba = bytearray(8)
>>> ba
bytearray(b'\x00\x00\x00\x00\x00\x00\x00\x00')
>>> with open("/dev/urandom", "rb") as source:
...     source.readinto(ba)
... 
8
>>> ba
bytearray(b'`m.z\x8d\x0fp\xa1')

通過該機制，我們可以很簡單寫入到預定義的buffer中（在C語言中，你可能需要多次調用malloc())。

適用memoryview，你甚至可以將數據放入到內存區域任意點：

>>> ba = bytearray(8)
>>> # Reference the _bytearray_ from offset 4 to its end
>>> ba_at_4 = memoryview(ba)[4:]
>>> with open("/dev/urandom", "rb") as source:
... # Write the content of /dev/urandom from offset 4 to the end of the
... # bytearray, effectively reading 4 bytes only
...     source.readinto(ba_at_4)
... 
4
>>> ba
bytearray(b'\x00\x00\x00\x00\x0b\x19\xae\xb2')

buffer protocol是實現低內存開銷的基礎，具備很強的性能。雖然Python隱藏了所有的內存分配，開發者不需要關系內部是怎么樣實現的。

可以再去了解一下array模塊和struct模塊是如何處理buffer protocol的，zero copy操作是相當高效的。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 零拷貝(Zero-Copy) Kafka之Zero-Copy 零拷貝（Zero-Copy）零拷貝（zero-copy）原理詳解零拷貝(Zero-copy) 淺析及其應用零拷貝（Zero-copy）及其應用詳解 Netty使用Google Protocol Buffer完成服務器高性能數據傳輸理解Netty中的零拷貝（Zero-Copy）機制【轉】 Python高性能編程 Linux I/O 原理和 Zero-copy 技術全面揭秘