我在下載的udacity中教程時,字幕和視頻是分離的,對於英文還無法完全聽懂的我來說,字幕還是比較重要.不想看解釋的可直接跳到最后復制代碼運行即可.
查看了vtt和srt的區別,使用記事本打開vtt和srt,發現主要有兩個
- 首行多了 WEBVTT\n\n 標識符
- 標點格式區別,vtt內部的"."在srt中為","
流程圖:
基於python寫了一個簡單的腳本對其進行批量修改
-
1 引入依賴
-
- os獲取文件信息
- sys獲取命令行輸入args
- re對獲取的文件內容進行匹配或更換
import os import sys import re
-
2 定義主函數
-
if __name__ == '__main__': args = sys.argv print(args) if os.path.isdir(args[1]): file_list = get_file_name(args[1], ".vtt") for file in file_list: vtt2srt(file) elif os.path.isfile(args[1]): vtt2srt(args[1]) else: print("arg[0] should be file name or dir")
-
3 定義獲取文件名稱函數get_file_name
-
def get_file_name(dir, file_extension): f_list = os.listdir(dir) result_list = [] for file_name in f_list: if os.path.splitext(file_name)[1] == file_extension: result_list.append(os.path.join(dir, file_name)) return result_list
-
4 定義轉換邏輯
-
def vtt2srt(file_name): content = open(file_name, "r", encoding="utf-8").read() # 刪除WEBVTT行 content = re.sub("WEBVTT\n\n",'',content) # 替換“.”為“,” content = re.sub("(\d{2}:\d{2}:\d{2}).(\d{3})", lambda m: m.group(1) + ',' + m.group(2), content) output_file = os.path.splitext(file_name)[0] + '.srt' open(output_file, "w", encoding="utf-8").write(content) def srt2vtt(file_name): content = open(file_name, "r", encoding="utf-8").read() # 添加WEBVTT行 content = "WEBVTT\n\n" + content # 替換“,”為“.” content = re.sub("(\d{2}:\d{2}:\d{2}),(\d{3})", lambda m: m.group(1) + '.' + m.group(2), content) output_file = os.path.splitext(file_name)[0] + '.vtt' open(output_file, "w", encoding="utf-8").write(content)
-
5 完整代碼
-
import os import sys import re def get_file_name(dir, file_extension): f_list = os.listdir(dir) result_list = [] for file_name in f_list: if os.path.splitext(file_name)[1] == file_extension: result_list.append(os.path.join(dir, file_name)) return result_list def vtt2srt(file_name): content = open(file_name, "r", encoding="utf-8").read() # 刪除WEBVTT行 content = re.sub("WEBVTT\n\n",'',content) # 替換“.”為“,” content = re.sub("(\d{2}:\d{2}:\d{2}).(\d{3})", lambda m: m.group(1) + ',' + m.group(2), content) output_file = os.path.splitext(file_name)[0] + '.srt' open(output_file, "w", encoding="utf-8").write(content) def srt2vtt(file_name): content = open(file_name, "r", encoding="utf-8").read() # 添加WEBVTT行 content = "WEBVTT\n\n" + content # 替換“,”為“.” content = re.sub("(\d{2}:\d{2}:\d{2}),(\d{3})", lambda m: m.group(1) + '.' + m.group(2), content) output_file = os.path.splitext(file_name)[0] + '.vtt' open(output_file, "w", encoding="utf-8").write(content) if __name__ == '__main__': args = sys.argv if os.path.isdir(args[1]): file_list = get_file_name(args[1], ".vtt") for file in file_list: vtt2srt(file) elif os.path.isfile(args[1]): vtt2srt(args[1]) print('done') else: print("arg[0] should be file name or dir")
注意:
-
1 為避免路徑錯誤,請使用文件夾的絕對路徑
- 代碼基於python3.x
-