Python判斷字符串編碼以及編碼的轉換

本文轉載自查看原文 2015-04-04 13:50 18076 Python雜集/ Python

判斷字符串編碼

使用 chardet 可以很方便的實現字符串/文件的編碼檢測。尤其是中文網頁，有的頁面使用GBK/GB2312，有的使用UTF8，如果你需要去爬一些頁面，知道網頁編碼很重要

>>> import urllib
>>> html = urllib.urlopen('http://www.chinaunix.net').read()

>>> import chardet
>>> chardet.detect(html)
{'confidence': 0.98999999999999999, 'encoding': 'GB2312'}

函數返回值為字典，有2個元素，一個是檢測的可信度，另外一個就是檢測到的編碼。

編碼轉換

先把其他編碼轉換為unicode再轉換其他編碼, 如utf-8轉換為gb2312

>>> import chardet
>>> str = "我們"
>>> print(chardet.detect(str))
{'confidence': 0.7525, 'encoding': 'utf-8'}

>>> str1 = str.decode('utf-8')

>>> str2 = str1.encode('gb2312')
>>> print(chardet.detect(str2))
{'confidence': 0.8095977270813678, 'encoding': 'TIS-620'}

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 python_判斷字符串編碼的方法 python：ord()和chr()——字符串和ASCll編碼轉換 Python的字符串編碼 java 判斷字符串編碼 js 字符串編碼轉換函數 python判斷字符串 Python中的字符串與字符編碼關於python中的字符串編碼理解 python 判斷字符串A在字符串B中 python使用chardet判斷字符串編碼，超簡單的代碼