最近用Python寫了些爬蟲,在爬取一個gb2312的頁面時,拋出異常:
- UnicodeEncodeError: 'ascii' codec can't encode characters in position 21-23: ordinal not in range(128)
解決方案如下:
首先設置系統的默認編碼為utf-8:
- import sys
- reload(sys)
- sys.setdefaultencoding('utf-8')
然后將網頁以gbk解碼后轉為utf-8:
- result = urllib2.urlopen(req).read()
- result = unicode(result,'GBK').encode('UTF-8')
之后就正常了。