一、如果按字節處理,可以用struct
https://docs.python.org/2/library/struct.html
By default, C types are represented in the machine’s native format and byte order, and properly aligned by skipping pad bytes if necessary (according to the rules used by the C compiler).
Alternatively, the first character of the format string can be used to indicate the byte order, size and alignment of the packed data, according to the following table:
Character |
Byte order |
Size |
Alignment |
---|---|---|---|
|
native |
native |
native |
|
native |
standard |
none |
|
little-endian |
standard |
none |
|
big-endian |
standard |
none |
|
network (= big-endian) |
standard |
none |
If the first character is not one of these, '@'
is assumed.
Format characters have the following meaning; the conversion between C and Python values should be obvious given their types. The ‘Standard size’ column refers to the size of the packed value in bytes when using standard size; that is, when the format string starts with one of '<'
, '>'
, '!'
or '='
. When using native size, the size of the packed value is platform-dependent.
Format |
C Type |
Python type |
Standard size |
Notes |
---|---|---|---|---|
|
pad byte |
no value |
||
|
|
string of length 1 |
1 |
|
|
|
integer |
1 |
(3) |
|
|
integer |
1 |
(3) |
|
|
bool |
1 |
(1) |
|
|
integer |
2 |
(3) |
|
|
integer |
2 |
(3) |
|
|
integer |
4 |
(3) |
|
|
integer |
4 |
(3) |
|
|
integer |
4 |
(3) |
|
|
integer |
4 |
(3) |
|
|
integer |
8 |
(2), (3) |
|
|
integer |
8 |
(2), (3) |
|
|
float |
4 |
(4) |
|
|
float |
8 |
(4) |
|
|
string |
||
|
|
string |
||
|
|
integer |
(5), (3) |
Notes:
-
The
'?'
conversion code corresponds to the_Bool
type defined by C99. If this type is not available, it is simulated using achar
. In standard mode, it is always represented by one byte.New in version 2.6.
-
The
'q'
and'Q'
conversion codes are available in native mode only if the platform C compiler supports Clong long
, or, on Windows,__int64
. They are always available in standard modes.New in version 2.2.
-
When attempting to pack a non-integer using any of the integer conversion codes, if the non-integer has a
__index__()
method then that method is called to convert the argument to an integer before packing. If no__index__()
method exists, or the call to__index__()
raisesTypeError
, then the__int__()
method is tried. However, the use of__int__()
is deprecated, and will raiseDeprecationWarning
.Changed in version 2.7: Use of the
__index__()
method for non-integers is new in 2.7.Changed in version 2.7: Prior to version 2.7, not all integer conversion codes would use the
__int__()
method to convert, andDeprecationWarning
was raised only for float arguments. -
For the
'f'
and'd'
conversion codes, the packed representation uses the IEEE 754 binary32 (for'f'
) or binary64 (for'd'
) format, regardless of the floating-point format used by the platform. -
The
'P'
format character is only available for the native byte ordering (selected as the default or with the'@'
byte order character). The byte order character'='
chooses to use little- or big-endian ordering based on the host system. The struct module does not interpret this as native ordering, so the'P'
format is not available.
A format character may be preceded by an integral repeat count. For example, the format string '4h'
means exactly the same as 'hhhh'
.
示例:
比如有一個結構體
struct Header
{
unsigned short id;
char[4] tag;
unsigned int version;
unsigned int count;
}
通過socket.recv接收到了一個上面的結構體數據,存在字符串s中,現在需要把它解析出來,可以使用unpack()函數.
import struct
id, tag, version, count = struct.unpack("!H4s2I", s)
上面的格式字符串中,!表示我們要使用網絡字節順序解析,因為我們的數據是從網絡中接收到的,在網絡上傳送的時候它是網絡字節順序的.后面的H表示 一個unsigned short的id,4s表示4字節長的字符串,2I表示有兩個unsigned int類型的數據.
就通過一個unpack,現在id, tag, version, count里已經保存好我們的信息了.
同樣,也可以很方便的把本地數據再pack成struct格式.
ss = struct.pack("!H4s2I", id, tag, version, count);
pack函數就把id, tag, version, count按照指定的格式轉換成了結構體Header,ss現在是一個字符串(實際上是類似於c結構體的字節流),可以通過 socket.send(ss)把這個字符串發送出去.
示例二:
import struct
a=12.34
#將a變為二進制
bytes=struct.pack('i',a)
此時bytes就是一個string字符串,字符串按字節同a的二進制存儲內容相同。
再進行反操作
現有二進制數據bytes,(其實就是字符串),將它反過來轉換成python的數據類型:
a,=struct.unpack('i',bytes)
注意,unpack返回的是tuple
所以如果只有一個變量的話:
bytes=struct.pack('i',a)
那么,解碼的時候需要這樣
a,=struct.unpack('i',bytes) 或者 (a,)=struct.unpack('i',bytes)
如果直接用a=struct.unpack('i',bytes),那么 a=(12.34,) ,是一個tuple而不是原來的浮點數了。
如果是由多個數據構成的,可以這樣:
a='hello'
b='world!'
c=2
d=45.123
bytes=struct.pack('5s6sif',a,b,c,d)
此時的bytes就是二進制形式的數據了,可以直接寫入文件比如 binfile.write(bytes)
然后,當我們需要時可以再讀出來,bytes=binfile.read()
再通過struct.unpack()解碼成python變量
a,b,c,d=struct.unpack('5s6sif',bytes)
'5s6sif'這個叫做fmt,就是格式化字符串,由數字加字符構成,5s表示占5個字符的字符串,2i,表示2個整數等等,下面是可用的字符及類型,ctype表示可以與python中的類型一一對應。
示例3:
file = open(file_name, "rb")
short_data = struct.unpack('<h',file.read(2))[0]
float_data = struct.unpack('<f', file.read(4))[0]
2. 有些協議定義字段長度是按照bit為單位的,3bit寬度,7bit寬度等,這樣的就不適合用struct了,
我們可以用bitstring,處理起來較為簡單
https://pypi.org/project/bitstring/
代碼示例:
import bitstring file = open(file_name, "rb") file_b = bitstring.BitStream(bytes=file.read() print file_b.read(3).int
print file_b.read(3).int
print file_b.read(7).bytes
也可以定義結構體
fmt = 'sequence_header_code, uint:12=horizontal_size_value, uint:12=vertical_size_value, uint:4=aspect_ratio_information, ... ' d = {'sequence_header_code': '0x000001b3', 'horizontal_size_value': 352, 'vertical_size_value': 288, 'aspect_ratio_information': 1, ... } s = bitstring.pack(fmt, **d)