在用Pandas讀取帶有中文內容的TXT文件的時候,會提示報錯
import pandas as pd inputfile1 = 'data/meidi_jd_process_end_正面情感結果.txt' inputfile2 = 'data/meidi_jd_process_end_負面情感結果.txt' outputfile1 = 'data/meidi_jd_neg.txt' outputfile2 = 'data/meidi_jd_pos.txt' data1 = pd.read_csv(inputfile1,encoding = 'utf-8',header = None) data2 = pd.read_csv(inputfile2,encoding = 'utf-8',header = None) OSError: Initializing from file failed
印象中我記得遇到過類似情況,需加engine =‘Python’
import pandas as pd inputfile1 = 'data/meidi_jd_process_end_正面情感結果.txt' inputfile2 = 'data/meidi_jd_process_end_負面情感結果.txt' outputfile1 = 'data/meidi_jd_neg.txt' outputfile2 = 'data/meidi_jd_pos.txt' data1 = pd.read_csv(inputfile1,encoding = 'utf-8',header = None,engine ='python') data2 = pd.read_csv(inputfile2,encoding = 'utf-8',header = None,engine ='python') UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
添加engine后又報錯UnicodeDecodeError,考慮可能原來的文件的編碼格式可能不是UTF-8,通過NotePAD ++ 打開TXT文件,修改編碼為UTF-8即可;