Python3中轉換字符串編碼

本文轉載自查看原文 2019-07-06 21:57 2354 python3/ python/ 字符串編碼

在使用subprocess調用Windows命令時，遇到了字符串不顯示中文的問題，源碼如下：#-*-coding:utf-8-*-__author__ = '$USER'

#-*-coding:utf-8-*-
__author__ = '$USER'

import subprocess
p = subprocess.Popen('nslookup www.qq.com', stdout=subprocess.PIPE)
p.wait()
print('returncode：%d' % p.returncode)
out = p.communicate()
for i in out:
    if i is not None:
        s = str(i, encoding='utf-8')
        print(s)

輸出如下：

returncode：0
File "F:/TECH/python/LearnPython100Days/subprocessSample.py", line 11, in <module>
s = str(i, encoding='utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb7 in position 0: invalid start byte

結果顯示，輸出變量在編碼為UT-8時出錯。這是因為Windows命令行使用的是GBK編碼格式（可在命令行屬性中查看），而不是UTF-8，因此直接進行轉換是不行的。因此，將代碼修改為：

s = str(i, encoding='GBK')

即可得到正確輸出：

returncode：0
服務器:  UnKnown
Address:  211.137.130.3

名稱:    https.qq.com
Addresses:  2402:4e00:8030:1::7d
	  121.51.142.21
Aliases:  www.qq.com

在項目中，為了避免出現亂碼，最好將所有的輸出全部統一為UTF-8格式。那么，如何實現呢？

1.使GBK將字節串編碼為中文；

2.使用UTF-8將中文字符串編碼為字節串；

3.使用UTF-8將該字節串解碼為字符串，即得到一串中文。

相關代碼如下：

for i in out:
    if i is not None:
        print('原始字節串(%s)：\n%s' %(chardet.detect(i)['encoding'],i))
        s = str(i, encoding='GBK')
        print('中文字符串：\n%s' %s)
        utf8_bytes = s.encode('UTF-8', 'ignore')
        print('轉碼后的字節串(%s)：\n%s' % (chardet.detect(utf8_bytes)['encoding'], utf8_bytes))
        utf8_str = utf8_bytes.decode('UTF-8')
        print('轉碼后的中文字符串：\n%s' %utf8_str)

輸出如下：

returncode：0
原始字節串(ISO-8859-9)：
b'\xb7\xfe\xce\xf1\xc6\xf7:  UnKnown\r\nAddress:  211.137.130.3\r\n\r\n\xc3\xfb\xb3\xc6:    https.qq.com\r\nAddresses:  2402:4e00:8030:1::7d\r\n\t  121.51.142.21\r\nAliases:  www.qq.com\r\n\r\n'
中文字符串：
服務器:  UnKnown
Address:  211.137.130.3

名稱:    https.qq.com
Addresses:  2402:4e00:8030:1::7d
	  121.51.142.21
Aliases:  www.qq.com


轉碼后的字節串(utf-8)：
b'\xe6\x9c\x8d\xe5\x8a\xa1\xe5\x99\xa8:  UnKnown\r\nAddress:  211.137.130.3\r\n\r\n\xe5\x90\x8d\xe7\xa7\xb0:    https.qq.com\r\nAddresses:  2402:4e00:8030:1::7d\r\n\t  121.51.142.21\r\nAliases:  www.qq.com\r\n\r\n'
轉碼后的中文字符串：
服務器:  UnKnown
Address:  211.137.130.3

名稱:    https.qq.com
Addresses:  2402:4e00:8030:1::7d
	  121.51.142.21
Aliases:  www.qq.com

注意：

1.字節串轉為GBK，再使用UTF-8轉為字節串后，其值發生了變化；

2.使用chardet模塊能夠檢測字節串的編碼類型，但是它的結果不保證准確，僅供參考。它將第一個字節串檢測成了‘ISO-8859-9’

3.在phthon3中，字符串的encode()方法能夠得到字節串，沒有decode方法；相應地，字節串bytes.decode()方法將其解碼為字符串，沒有encode方法。這里與python2不一樣。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 python中字符串編碼轉換 python3中字符串與字典的相互轉換 python3 字符串base64編碼關於python3中如何將ASCII的編碼字符串轉為中文 Python中的字符串和編碼 python3中bytes、hex和字符串相互轉換 python3 列表轉換為字符串 Python3 字符串與hex之間的相互轉換 Python字符串編碼轉換-encode()和decode()方法 python中字符串與字典的轉換