python提取文件中的汉字

本文转载自查看原文 2021-06-02 15:55 1277 Python

读取指定目录下的文件,提取文件中的所有汉字

# -*- coding: utf-8 -*-

import os
import io
import re

fo = open("word.txt", "w")

# 遍历指定目录，显示目录下的所有文件名
def each_file(filepath):
  for root, dirs, files in os.walk(filepath):
    for file in files:
      filename = os.path.join(root, file)
      read_file(filename)

def read_file(filename):
  with io.open(filename, 'r', encoding='utf-8', errors='ignore') as fn:
    lines = fn.readlines()
    han = re.compile('"[\u4e00-\u9fff]+"').findall(str(lines))
    for val in han:
      fo.write(val + "\n")



if __name__ == '__main__':
  each_file("src")

免责声明！

本站转载的文章为个人学习借鉴使用，本站对版权不负任何法律责任。如果侵犯了您的隐私权益，请联系本站邮箱yoyou2525@163.com删除。

猜您在找 python 提取目录中特定类型的文件 python中如何提取文件的前几行使用Python将excel文件中的数据提取到txt中 js只提取字符串中汉字中文汉字在python中的编码提取所有汉字 python 从txt文件中提取数据保存到 xlxs 文件中 python 从2个文件中提取不相同的内容并输出到第三个文件中如何用python批量提取各个.csv文件中某范围的数值并进行合并 python提取批量文件内的指定内容