1、字符編碼發展史

2、python默認編碼

python2.x默認的字符編碼是ASCII，默認的文件編碼是ASCII。（這里的字符是python中的字符串，文件是.py文件）
python3.x默認的字符編碼是unicode，默認的文件編碼是utf-8。

1、Python2 默認的字符編碼是ASCII（不支持中文）

#-*-coding:utf-8-*-，告知python解釋器，這個.py文件里的文本是用utf-8編碼的。這樣，python就會依照utf-8的編碼形式解讀其中的字符，然后轉換成unicode編碼內部處理使用。
.py文件存儲在磁盤上是什么編碼方式，就要告知python解釋器用的是什么編碼方式。(#-*-coding:XXX-*-聲明的編碼方式要與.py文件的編碼方式一致)

#!/usr/bin/env python
# -*- coding: utf-8 -*-    #
print "你好,世界"

2、Python2 默認的文件編碼是ASCII

C:\Windows\system32>python2 D:\test\python\test\test2.py

#coding:utf8
import sys
print(sys.getdefaultencoding())    #結果是：ascii

3、Python3 默認的字符編碼是Unicode（支持中文）

#!/usr/bin/env python3
print "你好,世界"

4、Python3 默認的文件編碼是utf-8

C:\Windows\system32>python3 D:\test\python\test\test3.py

import sys
print(sys.getdefaultencoding())    #結果是：utf-8

3、python中字符串的兩種數據類型

python2不區分str和unicode，但在拼接str和unicode時，str將自動轉換成unicode。

###python2.7
# coding:utf8
a = 'hello'
b = '中國'
print(a, type(a))          #結果是：('hello', <type 'str'>)
print(b, type(b))          #結果是：('\xe4\xb8\xad\xe5\x9b\xbd', <type 'str'>)    #前面輸出的是字符的編碼，這是print函數的原因。這問題只在python2中出現
print(b)                   #結果是：中國

c = 'hello' + u'中國'    #字符串前加u是將此字符串轉換為unicode
print(c, type(c))          #結果是：(u'hello\u4e2d\u56fd', <type 'unicode'>)    #后面輸出的是字符的編碼，這是print函數的原因。
print(c)　　                #結果是：hello中國

python3嚴格區分了bytes和str，二者不能進行拼接。文本總是unicode，有str表示；二進制數據則由bytes表示。

a = b'hello'              #字符串前加b是將此字符串轉換為bytes
b = '中國'

print(a, type(a))            #結果是：b'hello' <class 'bytes'>
print(b, type(b))            #結果是：中國 <class 'str'>

# c = b'hello' + '中國'    #異常，TypeError: can't concat str to bytes
# print(c, type(c))

示例1：

python2.7，在windows的cmd中執行

# coding:utf8                 #告知python解釋器，這個.py文件里的文本是用utf-8編碼的，這樣解釋器就會依照utf-8的編碼形式解讀其中的字符。

str1 = 'hengha'
print(str1, type(str1))          #結果是：('hengha', <type 'str'>)    #python2.x將字符串處理為str類型，即字節型。

str2 = '哼哈'
print(str2, type(str2))          #結果是：('\xe5\x93\xbc\xe5\x93\x88', <type 'str'>)    #前面輸出的是字符的編碼，這是print函數的原因。
print(str2)                      #結果是：鍝煎搱    #這里python解釋器用的編碼是utf8，但cmd用的是gbk。按utf8輸出，用gbk解釋。

uutf = str2.decode('utf8')    #str2本來就是字節型的，因此只能進行decode解碼。
print(uutf, type(uutf))          #結果是：(u'\u54fc\u54c8', <type 'unicode'>)    #前面輸出的是字符的編碼，這是print函數的原因。
print(uutf)                      #結果是：哼哈

x = 1
print(x, type(x))                #結果是：(1, <type 'int'>)
y = 1.1
print(y, type(y))                #結果是：(1.1, <type 'float'>)

示例2：

python3.8，在windows的cmd中執行

str1 = 'hengha'
print(str1, type(str1))        #結果是：hengha <class 'str'>    #python3.x將字符串處理為str類型，即字符串型。

str2 = '哼哈'
print(str2, type(str2))        #結果是：哼哈 <class 'str'>    #python3.x將字符串處理為str類型，即字符串型（unicode編碼）。
print(str2)                    #結果是：哼哈

uutf=str2.encode('utf8')    #str2本來就是字符串型的，因此只能進行encode編碼。
print(uutf,type(uutf))         #結果是：b'\xe5\x93\xbc\xe5\x93\x88' <class 'bytes'>
print(uutf)                    #結果是：b'\xe5\x93\xbc\xe5\x93\x88'

x = 1
print(x,type(x))               #結果是：1 <class 'int'>
y=1.1
print(y,type(y))               #結果是：1.1 <class 'float'>

4、python中編碼和解碼

python3中字符串本身就是unicode，即字符串型。要轉換成字節型，就要編碼encode。
python2中字符串本身就是bytes，即字節型。要轉換成字符串型，就要解碼decode，文件.py使用的是什么方式編碼，就要用什么方式解碼。

1、使用encode和decode進行編碼解碼

python2和python3都支持。

>>> hh = 'hello 中國'
>>> hh.encode('utf-8')　　　　　　　　#將'hello 中國'從unicode編碼成utf-8
b'hello \xe4\xb8\xad\xe5\x9b\xbd'
>>> hh.encode('gbk')　　　　　　　　　#將'hello 中國'從unicode編碼成gbk
b'hello \xd6\xd0\xb9\xfa'
>>> hh.encode('ascii')　　　　　　　　#將'hello 中國'從unicode編碼成ascii,但ascii不支持中文，因此出現異常　　　　　　　　
Traceback (most recent call last):
  File "<pyshell#54>", line 1, in <module>
    hh.encode('ascii')
UnicodeEncodeError: 'ascii' codec can't encode characters in position 6-7: ordinal not in range(128)

>>> b'hello \xe4\xb8\xad\xe5\x9b\xbd'.decode('utf-8')　　#將其解碼成Unicode，解碼時告訴解碼器自身是utf-8
'hello 中國'
>>> b'hello \xe4\xb8\xad\xe5\x9b\xbd'.decode('gbk')　　　#將其解碼成Unicode，解碼時告訴解碼器自身是gbk，但其並不是gbk而是utf-8，因此出現異常
Traceback (most recent call last):
  File "<pyshell#58>", line 1, in <module>
    b'hello \xe4\xb8\xad\xe5\x9b\xbd'.decode('gbk')
UnicodeDecodeError: 'gbk' codec can't decode byte 0xad in position 8: illegal multibyte sequence

2、使用bytes和str進行編碼解碼

僅python3支持，python2不支持。
在python3中，bytes和encode的作用都是編碼，str和decode的作用都是解碼。

###python3
hh = 'hello 中國'
print(hh)
d=hh.encode('utf-8')
print(d)
e=d.decode('utf-8')
print(e)

d2=bytes(hh,'utf-8')
print(d2)
e2=str(d2,'utf-8')
print(e2)

5、python的亂碼問題

1、編碼出現問題，可能的原因

python解釋器的默認編碼
Terminal使用的編碼
python源文件文件編碼
操作系統的語言設置

2、Python支持中文的編碼

unicode、utf-8、gbk和gb2312。
- uft-8為國際通用，常用有數據庫、編寫代碼。
- gbk如windows的cmd使用。

3、python使用編碼的流程

保存.py文件時的編碼
python解釋器使用的編碼（默認或在.py文件中聲明），要和.py的保持一致
加載到內存中，python字符串的編碼是unicode（python3），其他的是解釋器用的編碼
解釋器輸出時用的還是解釋器使用的編碼
顯示終端的編碼

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 python3-字符編碼 Python3 - 字符編碼 Python chardet字符編碼的判斷 python字符串編碼 python 判斷文件的字符編碼 Python中的字符串與字符編碼 Python中的字符串與字符編碼 python字符串編碼轉換 python3 字符編碼與轉碼的理解 python 讀寫文件和設置文件的字符編碼