在用Pandas读取带有中文内容的TXT文件的时候,会提示报错
import pandas as pd inputfile1 = 'data/meidi_jd_process_end_正面情感结果.txt' inputfile2 = 'data/meidi_jd_process_end_负面情感结果.txt' outputfile1 = 'data/meidi_jd_neg.txt' outputfile2 = 'data/meidi_jd_pos.txt' data1 = pd.read_csv(inputfile1,encoding = 'utf-8',header = None) data2 = pd.read_csv(inputfile2,encoding = 'utf-8',header = None) OSError: Initializing from file failed
印象中我记得遇到过类似情况,需加engine =‘Python’
import pandas as pd inputfile1 = 'data/meidi_jd_process_end_正面情感结果.txt' inputfile2 = 'data/meidi_jd_process_end_负面情感结果.txt' outputfile1 = 'data/meidi_jd_neg.txt' outputfile2 = 'data/meidi_jd_pos.txt' data1 = pd.read_csv(inputfile1,encoding = 'utf-8',header = None,engine ='python') data2 = pd.read_csv(inputfile2,encoding = 'utf-8',header = None,engine ='python') UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
添加engine后又报错UnicodeDecodeError,考虑可能原来的文件的编码格式可能不是UTF-8,通过NotePAD ++ 打开TXT文件,修改编码为UTF-8即可;