pandas（python2）讀取中文數據，處理中文列名

本文轉載自查看原文 2017-02-20 22:27 6308 python

要點：

python修改默認編碼為utf-8;
在讀取csv或者 xls文件時寫入參數encoding="gbk"；如果 gbk也不能 decode，使用收錄字符更廣的‘’gb18030‘’解碼。
使用中文列名時 decode('utf-8'), 或者 u'中文列名'；一勞永逸> from __future__ import unicode_literals
使用codecs模塊讀取中文文本

# -*- coding: utf-8 -*-
import sys
reload(sys) 
sys.setdefaultencoding('utf8') 
import pandas as pd

path_1= 'brokerUserfeeList.xls'

x = pd.read_excel(path_, encoding="gbk")
print x.columns
print x["成交金額".decode('utf-8')]

#print x[u"成交金額"] #建議使用加u，或者import future，兼容python3

####output：

Index([u'序號', u'成交金額'], dtype='object')
0 11,053.00
1 43,935.40
2 467,327.83
3 32,811.07
4 17,651.10
5 4,629.80

=======================================================

Windows對於讀取中文文本，可以使用讀取后decode('gbk')，即解碼成unicode

open(u'C:\\Users\\Administrator\\Desktop\\222.txt' ).read().decode('gbk')

寫的時候就需要，用encode('gbk')把unicode編碼成字節流再寫入

ttt = u'看了看打扮卡了號地塊編碼，vas'

with open(ur'c:\Users\Administrator\Desktop\222222.txt', 'w') as f:
　　f.write(ttt.encode('gbk'))

推薦使用codecs 模塊，codecs.open() 帶encoding參數，直接搞定

with codecs.open(ur'c:\Users\Administrator\Desktop\2222.txt', 'w', encoding='gbk') as f:
    f.write(ttt)

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 python pandas 給dataframe添加列名 Pandas讀取表格數據進行處理 Pandas讀取csv時設置列名 Python基於pandas的數據處理（二） Python基於pandas的數據處理（一） python 利用pandas讀取本地中CSV文件的指定列列名重命名並保存回本地 Python讀取Excel數據后批量修改索引和列名 python讀取csv數據（添加列名，指定分隔方式） 2018.03.29 python-pandas 數據讀取用python的pandas讀取excel文件中的數據