通過此方法去掉文本中非漢字,並將空白的行刪除:
首先是分兩步來實現:
需要處理的文本文件:

當時這個 3 00:00:04,02 --> 00:00:05,13 有喝多的武將 4 00:00:05,18 --> 00:00:07,05 一看許姬很漂亮 5 00:00:07,12 --> 00:00:09,06 就欲行非禮就拽上去 6 00:00:09,09 --> 00:00:10,21 這許姬手也挺快 7 00:00:11,03 --> 00:00:12,14 黑咕隆能看不見呢 8 00:00:12,14 --> 00:00:13,08 順手誇 9 00:00:13,13 --> 00:00:17,05 把這武將頭盔頂上那鷹帶給摘下來了 10 00:00:17,12 --> 00:00:19,03 哎就是頭盔上綁着帶了
下面代碼實現去掉文件中非漢字:
import re def del_no_china(infile, outfile): infopen = open(infile, 'r', encoding="utf-8") outfopen = open(outfile, 'w', encoding="utf-8") lines = infopen.readlines() for line in lines: g = line.encode().decode() k = re.findall('[\u4e00-\u9fa5]', g) s = ''.join(k) if s.split(): outfopen.writelines(s) else: outfopen.writelines("") outfopen.writelines("\n") infopen.close() outfopen.close() del_no_china("處理前.txt", "處理中.txt")
上面的代碼執行結果如下:

當時這個
有喝多的武將
一看許姬很漂亮
就欲行非禮就拽上去
這許姬手也挺快
黑咕隆能看不見呢
順手誇
把這武將頭盔頂上那鷹帶給摘下來了
哎就是頭盔上綁着帶了
下面的代碼實現去掉上面文本中的空白行:
def delblankline(infile, outfile): infopen = open(infile, 'r', encoding="utf-8") outfopen = open(outfile, 'w', encoding="utf-8") lines = infopen.readlines() for line in lines: if line.split(): outfopen.writelines(line) else: outfopen.writelines("") infopen.close() outfopen.close() delblankline("處理中.txt", "處理后.txt")
上面代碼執行結果如下:

當時這個
有喝多的武將
一看許姬很漂亮
就欲行非禮就拽上去
這許姬手也挺快
黑咕隆能看不見呢
順手誇
把這武將頭盔頂上那鷹帶給摘下來了
哎就是頭盔上綁着帶了
兩步合在一起的代碼為:
import re def del_no_china(infile, outfile): infopen = open(infile, 'r', encoding="utf-8") outfopen = open(outfile, 'w', encoding="utf-8") lines = infopen.readlines() for line in lines: g = line.encode().decode() print(g) k = re.findall('[\u4e00-\u9fa5]', g) s = ''.join(k) #print(s) if s.split(): outfopen.writelines(s) else: outfopen.writelines("") outfopen.writelines('\n') #實現換行 infopen.close() outfopen.close() del_no_china("處理前.txt", "處理中.txt") #第一個函數的作用是:去掉文本中的非漢字,字符! def delblankline(infile, outfile): infopen = open(infile, 'r', encoding="utf-8") outfopen = open(outfile, 'w', encoding="utf-8") lines = infopen.readlines() for line in lines: if line.split(): outfopen.writelines(line) else: outfopen.writelines("") infopen.close() outfopen.close() delblankline("處理中.txt", "處理后.txt") #第二個函數的作用是:去掉文本中的空白行。
最終效果也是一樣的!