Python：open的文件讀取操作，utf-8，UnicodeDecodeError

本文轉載自查看原文 2020-05-19 12:07 4776 utf-8/ python相關文檔/ UnicodeDecodeError/ 文件指針/ open/ Python

簡要目錄：

open函數

將文件設置為utf-8編碼格式

UnicodeDecodeError

f.read() 和 f.read(size)

f.readline() 和 f.readlines()

f.tell()：返回文件指針的位置，注意換行符

f.writelines() 和 f.write()

f.seek()：設置文件指針的位置 —— f.seek(偏移量,[起始位置])

文件指針

with open：可以不需要顯式關閉文件操作：f.close()

f.__next__()：讀取下一行

mode的詳細參數

Python通過創建文件對象，進行磁盤文件的讀寫（IO）。

主要函數：def open(file, mode='r', buffering=None, encoding=None, errors=None, newline=None, closefd=True)

file：指的是文件路徑

mode：讀寫模式，單個的模式主要有：

buffering：緩存模式

encoding：編碼格式

newline：要是針對不同操作系統的換行符不一致產生的策略

部分參數解釋：

mode：（可組合）

    ========= ===============================================================
    Character Meaning
    --------- ---------------------------------------------------------------
    'r'       open for reading (default)
    'w'       open for writing, truncating the file first
    'x'       create a new file and open it for writing
    'a'       open for writing, appending to the end of the file if it exists
    'b'       binary mode
    't'       text mode (default)
    '+'       open a disk file for updating (reading and writing)
    'U'       universal newline mode (deprecated)
    ========= ===============================================================

buffering：

    buffering is an optional integer used to set the buffering policy.
    Pass 0 to switch buffering off (only allowed in binary mode), 1 to select
    line buffering (only usable in text mode), and an integer > 1 to indicate
    the size of a fixed-size chunk buffer.  When no buffering argument is
    given, the default buffering policy works as follows:
    
    * Binary files are buffered in fixed-size chunks; the size of the buffer
      is chosen using a heuristic trying to determine the underlying device's
      "block size" and falling back on `io.DEFAULT_BUFFER_SIZE`.
      On many systems, the buffer will typically be 4096 or 8192 bytes long.
    
    * "Interactive" text files (files for which isatty() returns True)
      use line buffering.  Other text files use the policy described above
      for binary files.

newline：

    newline controls how universal newlines works (it only applies to text
    mode). It can be None, '', '\n', '\r', and '\r\n'.  It works as
    follows:
    
    * On input, if newline is None, universal newlines mode is
      enabled. Lines in the input can end in '\n', '\r', or '\r\n', and
      these are translated into '\n' before being returned to the
      caller. If it is '', universal newline mode is enabled, but line
      endings are returned to the caller untranslated. If it has any of
      the other legal values, input lines are only terminated by the given
      string, and the line ending is returned to the caller untranslated.
    
    * On output, if newline is None, any '\n' characters written are
      translated to the system default line separator, os.linesep. If
      newline is '' or '\n', no translation takes place. If newline is any
      of the other legal values, any '\n' characters written are translated
      to the given string.

具體示例：

1. 首先建立文件如下，使用utf-8編碼：打開原txt-->輸入文本-->另存為utf-8-->覆蓋原txt 【將文件設置為utf-8編碼格式】

2. UnicodeDecodeError: 'gbk' codec can't decode byte 0xa4 in position 54: illegal multibyte sequence

出現這個錯誤時，一般是因為encoding未設置造成，例如：

f1 = open(path, 'r')
a = f1.read()    #read()一次讀取全部內容，數據量很大時建議使用readline或者read(1024)等，1024表示字節數
# UnicodeDecodeError: 'gbk' codec can't decode byte 0xa4 in position 54: illegal multibyte sequence
print(a)
f1.close()

解決：

f2 = open(path, 'r', encoding='utf-8')
a = f2.read()    #read()一次讀取全部內容，數據量很大時建議使用readline或者read(1024)等，1024表示字節數
print(a)
f2.close()

3. f.read() 和 f.read(size)

f.read()：一次讀取整個文件

f.read(size)：一次讀取size字節大小的數據

# ------------------------------
f2 = open(path, 'r', encoding='utf-8')
a = f2.read()    #read()一次讀取全部內容，數據量很大時建議使用readline或者read(1024)等，1024表示字節數
print(a)
f2.close()

# ------------------------------
f3 = open(path, 'r', encoding='utf-8')
a = f3.read(4)    #數據量很大時建議使用readline或者read(size)等，size表示字節數
print(a)
f3.close()

4. f.readline() 和 f.readlines()

f.readline()：每次讀取一行，注意文件指針的位置

f.readlines()：一次讀取多行，構成list

# ------------------------------
f4 = open(path, 'r', encoding='utf-8')
a = f4.readline()    #readline()一次讀取一行
print(a)
f4.close()


# -----------------------------
f5 = open(path, 'r', encoding='utf-8')
a = f5.readlines()    #readlines()一次讀取多行，構成list，可以用於迭代
print('a：',a)
f5.close()

5. f.tell()：返回文件指針的位置，注意換行符

f7 = open(path, 'r', encoding='utf-8')
a = f7.readline()    #readlines()一次讀取多行，構成list，可以用於迭代
print(a)
cc = f7.tell()   # 返回文件當前位置
print(cc)
f7.close()

6. f.writelines()：向文件寫入一個序列字符串列表，如果需要換行則要自己加入每行的換行符。

r+：文件指針從開頭開始讀寫

w+：打開一個文件用於讀寫。如果該文件已存在則打開文件，並從開頭開始編輯，即原有內容會被刪除。如果該文件不存在，創建新文件。

f8 = open(path, 'r+', encoding='utf-8')   # r+文件指針從開頭開始讀寫
a = f8.readlines()         #readlines()一次讀取多行，構成list，可以用於迭代
print('a：',a)
f8.writelines(['\n陳王昔時宴平樂，斗酒十千恣戲虐\n','哎，李白'])   # 向文件寫入一個序列字符串列表，此時文件指針在結尾處，相當於追加了
dd = f8.read()
print('dd：',dd)  #此時文件指針在末尾，所以為空
f8.close()

7. f.write()：寫入單個字符串

f9 = open(path, 'r+', encoding='utf-8') # r+文件指針從開頭開始讀寫
a = f9.readlines()    #readlines()一次讀取多行，構成list，可以用於迭代
print('a：',a)
f9.write('\n陳王昔時宴平樂，斗酒十千恣戲虐') #通過上述讀操作，文件指針在結尾處；此時寫操作也會在結尾處進行
b = f9.read()
print('b：',b)  #為空，因為文件指針的位置
f9.close()

8. f.seek()：用於設置文件指針的位置 —— f.seek(偏移量,[起始位置])

a+：打開一個文件用於讀寫。如果該文件已存在，文件指針將會放在文件的結尾。文件打開時會是追加模式。如果該文件不存在，創建新文件用於讀寫。

f10 = open(path, 'r', encoding='utf-8')
a = f10.readlines()    #readlines()一次讀取多行，構成list，可以用於迭代
print('a：',a)
f10.close()
f11 = open(path, 'a+', encoding='utf-8')   #a+模式用於追加
f11.write('\n陳王昔時宴平樂，斗酒十千恣戲虐')
b = f11.read()
print('b：',b)    #這里值為空，是因為文件指針此時在文件的末尾；如果要讀數據，需要設置文件指針的位置
f11.seek(6,0)  # f.seek(偏移量,[起始位置])，用來移動文件指針；偏移量: 單位為比特（字節），可正可負；起始位置: 0-文件頭, 默認值; 1-當前位置; 2-文件尾
b = f11.read()  
print('b：',b)
f11.close()

9. 文件指針

# 文件指針
f12 = open(path, 'a+', encoding='utf-8')   #a+模式用於追加
f12.write('\n陳王昔時宴平樂，斗酒十千恣戲虐')
c = f12.tell()   # 返回文件當前位置，因為a+，所以上述的write操作在末尾進行，然后文件指針也在末尾了：269
print('c：',c)
f12.seek(6,0)  # f.seek(偏移量,[起始位置])，用來移動文件指針；偏移量: 單位為比特（字節），可正可負；起始位置: 0-文件頭, 默認值; 1-當前位置; 2-文件尾
d = f12.tell()   # 返回文件當前位置，經過偏移之后為：6
print('d：',d)
e = f12.read()  #讀取整個文件，讀完后又到末尾了
print('e：',e)
f = f12.tell()   # 返回文件當前位置，此時在末尾：269
print('f：',f)
f12.close()

10. with open：可以不需要顯式關閉文件操作：f.close()

f.__next__()：讀取下一行

# ------------------------------------
# 2. with open：可以不需要顯式關閉文件操作：f.close
with open(path, 'r', encoding='utf-8') as f13:
    a = f13.readlines()    #readline()一次讀取一行
    print(a)

# 下一行：f.__next__()
with open(path, 'r', encoding='utf-8') as f13:
    a = f13.readline()    #readline()一次讀取一行
    print('a：',a)
    b = f13.__next__()
    print('b：',b)

11. mode的詳細參數（來自菜鳥教程）：

模式	描述
t	文本模式 (默認)。
x	寫模式，新建一個文件，如果該文件已存在則會報錯。
b	二進制模式。
+	打開一個文件進行更新(可讀可寫)。
U	通用換行模式（不推薦）。
r	以只讀方式打開文件。文件的指針將會放在文件的開頭。這是默認模式。
rb	以二進制格式打開一個文件用於只讀。文件指針將會放在文件的開頭。這是默認模式。一般用於非文本文件如圖片等。
r+	打開一個文件用於讀寫。文件指針將會放在文件的開頭。
rb+	以二進制格式打開一個文件用於讀寫。文件指針將會放在文件的開頭。一般用於非文本文件如圖片等。
w	打開一個文件只用於寫入。如果該文件已存在則打開文件，並從開頭開始編輯，即原有內容會被刪除。如果該文件不存在，創建新文件。
wb	以二進制格式打開一個文件只用於寫入。如果該文件已存在則打開文件，並從開頭開始編輯，即原有內容會被刪除。如果該文件不存在，創建新文件。一般用於非文本文件如圖片等。
w+	打開一個文件用於讀寫。如果該文件已存在則打開文件，並從開頭開始編輯，即原有內容會被刪除。如果該文件不存在，創建新文件。
wb+	以二進制格式打開一個文件用於讀寫。如果該文件已存在則打開文件，並從開頭開始編輯，即原有內容會被刪除。如果該文件不存在，創建新文件。一般用於非文本文件如圖片等。
a	打開一個文件用於追加。如果該文件已存在，文件指針將會放在文件的結尾。也就是說，新的內容將會被寫入到已有內容之后。如果該文件不存在，創建新文件進行寫入。
ab	以二進制格式打開一個文件用於追加。如果該文件已存在，文件指針將會放在文件的結尾。也就是說，新的內容將會被寫入到已有內容之后。如果該文件不存在，創建新文件進行寫入。
a+	打開一個文件用於讀寫。如果該文件已存在，文件指針將會放在文件的結尾。文件打開時會是追加模式。如果該文件不存在，創建新文件用於讀寫。
ab+	以二進制格式打開一個文件用於追加。如果該文件已存在，文件指針將會放在文件的結尾。如果該文件不存在，創建新文件用於讀寫。

默認為文本模式，如果要以二進制模式打開，加上 b 。

參考：

https://www.cnblogs.com/mengyu/p/6638975.html

https://www.runoob.com/python/file-methods.html

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Python讀取txt文件報錯：UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc8 in position 0 【Python】讀取cvs文件報錯：UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb1 in position 6: invalid start byte python 讀取帶BOM的utf-8格式文件關於Python文檔讀取UTF-8編碼文件問題 python 讀取帶BOM的utf-8格式文件文件讀取錯誤UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 884: invalid start byte Pandas讀取文件報錯UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb6 in position 0: invalid start byte 通過Python讀取csv文件報錯的File "D:\Python\lib\codecs.py", line 321, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x90 in 用python3讀csv文件出現UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd6 in position 0: invalid continuation byte c++ 讀取 utf-8 文件到 string