Python字符編碼轉換Unicode和str

本文轉載自查看原文 2019-09-15 17:26 22185 Python

參考鏈接1：https://blog.csdn.net/VictoriaW/article/details/75314737

參考鏈接2：https://blog.csdn.net/sheldonwong/article/details/86684761
Unicode和str

## str 我們平時寫的用引號括起來的字符串都是str類型的。
>>> x = '哈哈'
>>> x
'\xb9\xfe\xb9\xfe'
### 根據上面的打印結果，可以知道str類型的x存的其實是二進制序列，而非字符串。為什么會出現這種情況呢？我們賦給x的明明是字符串。
其實很簡單，x經過了一次隱形的編碼過程encode()。應該采用的是系統默認編碼方案。 

## unicode 如果在引號的前面加上字符u，那么我們就得到一個unicode字符串：
>>> x = u'哈哈'
>>> x
u'\u54c8\u54c8'
### unicode對象保存的是字符串本身，而非二進制序列。比如程序中的unicode字符串中包含兩個U+54c8字符。

### 為了避免錯誤，在寫入文件之前，應該用utf-8或者gbk編碼方案對unicode字符串編碼
>>> x = u'哈哈'
>>> x
u'\u54c8\u54c8'
>>> f = open('test.txt', 'w');
>>> x = x.encode('utf-8') #unicode -> str
>>>x
'\xe5\x93\x88\xe5\x93\x88'
>>> f.write(x)

Unicode strings can be encoded in plain strings in a variety of ways, according to whichever encoding you choose:
Unicode字符串可以用多種方式編碼為普通字符串, 依照你所選擇的編碼(encoding):

   1 #將Unicode轉換成普通的Python字符串:"編碼(encode)"
   2 unicodestring = u"Hello world"
   3 utf8string = unicodestring.encode("utf-8")
   4 asciistring = unicodestring.encode("ascii")
   5 isostring = unicodestring.encode("ISO-8859-1")
   6 utf16string = unicodestring.encode("utf-16")
   7 
   8 
   9 #將普通的Python字符串轉換成Unicode: "解碼(decode)"
  10 plainstring1 = unicode(utf8string, "utf-8")
  11 plainstring2 = unicode(asciistring, "ascii")
  12 plainstring3 = unicode(isostring, "ISO-8859-1")
  13 plainstring4 = unicode(utf16string, "utf-16")
  14 
  15 assert plainstring1==plainstring2==plainstring3==plainstring4

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 python 在Unicode和普通字符串 str 之間轉換 PYTHON編碼處理-str與Unicode的區別 python中的字符編碼和str與bytes類型轉換 python2中將Unicode編碼的中文和str相互轉換 python2將str類型與unicode類型字符串寫入文件的編碼問題 python 將列表嵌套字典的unicode字符串轉換為str格式的字符串的方法 python-unicode編碼轉換 js字符串與Unicode編碼互相轉換 python unicode轉中文及轉換默認編碼 python unicode轉中文及轉換默認編碼