經常用python打開中文文檔,然后呢,經常忘記編碼,經常出錯,記錄錯誤:
UnicodeDecodeError: 'gbk' codec can't decode byte 0xad in position 5: illegal multibyte sequence
找出報錯的代碼行。
1 filename = '有中文內容的.txt' 2 with open(filename, 'r') as file_object: 3 line = file_object.readlines() 4 print(line)
修復錯誤:
1 filename = '有中文內容的.txt' 2 with open(filename, 'r', encoding='utf-8') as file_object: 3 line = file_object.readlines() 4 print(line)
延伸一點,上面使用的是上下文管理器打開的文檔,所以不需要關閉。如果是直接open的,一定要記得關閉,這樣能節省內存了啦。
找出錯誤的代碼行。
1 filename = open('有中文字體或者是gbk編碼的文檔.txt','r') 2 for line in filename: #按行讀取 3 print(line.strip()) #去除換行符 4 filename.close() #關閉文檔
修復錯誤:
1 filename = open('毛概.txt','r',encoding='utf-8') #加上編碼 2 for line in filename: #按行讀取 3 print(line.strip()) #去除換行符 4 filename.close() #關閉文檔