Python報錯：UnicodeEncodeError 'gbk' codec can't encode character

本文轉載自查看原文 2019-01-08 16:17 746 python

今天在使用Python文件處理寫網絡上爬取的文件的時候，遇到了錯誤：UnicodeEncodeError: ‘gbk’ codec can’t encode character ‘\xa0’ in position … 這個問題。

代碼：

import urllib.request  #等價與from urllib import request

response = urllib.request.urlopen("http://www.baidu.com")
print("查看response響應的類型",type(response))
page_contect = response.read()
with open(r'C:\Users\PINPIN\Desktop\docx\123.txt','w+') as f1:
    f1.write(page_contect.decode('utf-8'))

出現錯誤：

查看response響應的類型 <class 'http.client.HTTPResponse'>

Traceback (most recent call last):

File "C:\Users\PINPIN\Desktop\docx\url_test.py", line 6, in <module>

f1.write(page_contect.decode('utf-8'))

UnicodeEncodeError: 'gbk' codec can't encode character '\xbb' in position 29150: illegal multibyte sequence

出現問題的原因：在windows下面，新文件的默認編碼是gbk，這樣的話，python解釋器會用gbk編碼去解析我們的爬取的網絡數據流，然而數據流此時已經是decode過的unicode編碼，這樣的話就會導致解析不了。

解決的辦法：改變目標文件的編碼即可

在打開文件時，指定文件編碼格式：encode=’utf-8’

with open(r'C:\Users\PINPIN\Desktop\docx\123.txt','w+',encode=’utf-8’) as f1:

另外：網絡數據流的編碼，比如獲取網頁，網絡數據流的編碼就是網頁的編碼。需要使用decode解碼成unicode編碼。否則也會報錯哦：TypeError: write() argument must be str, not bytes

f1.write(page_contect.decode('utf-8'))所以在這里需要進行解碼decode('utf-8')

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。