python提取文件中的漢字

本文轉載自查看原文 2021-06-02 15:55 1277 Python

讀取指定目錄下的文件,提取文件中的所有漢字

# -*- coding: utf-8 -*-

import os
import io
import re

fo = open("word.txt", "w")

# 遍歷指定目錄，顯示目錄下的所有文件名
def each_file(filepath):
  for root, dirs, files in os.walk(filepath):
    for file in files:
      filename = os.path.join(root, file)
      read_file(filename)

def read_file(filename):
  with io.open(filename, 'r', encoding='utf-8', errors='ignore') as fn:
    lines = fn.readlines()
    han = re.compile('"[\u4e00-\u9fff]+"').findall(str(lines))
    for val in han:
      fo.write(val + "\n")



if __name__ == '__main__':
  each_file("src")

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 python 正則提取中文,漢字 excel函數提取內容中的漢字 C#提取html中的漢字 python提取字符串中的漢字數字字母 python 提取目錄中特定類型的文件 python中如何提取文件的前幾行 python提取bag文件中的圖片數據 Python之通配符--提取文件中的內容並輸出【python】提取pdf文件中的所有圖片使用Python將excel文件中的數據提取到txt中