關於vtt 與 srt 字幕 的相互轉換


我在下載的udacity中教程時,字幕和視頻是分離的,對於英文還無法完全聽懂的我來說,字幕還是比較重要.不想看解釋的可直接跳到最后復制代碼運行即可.

 

查看了vtt和srt的區別,使用記事本打開vtt和srt,發現主要有兩個

 

  1. 首行多了 WEBVTT\n\n 標識符
  2. 標點格式區別,vtt內部的"."在srt中為","

流程圖:

 

 

基於python寫了一個簡單的腳本對其進行批量修改

 

  • 1 引入依賴

    1. os獲取文件信息
    2. sys獲取命令行輸入args
    3. re對獲取的文件內容進行匹配或更換
import os
import sys
import re
  • 2 定義主函數 

  • if __name__ == '__main__':
        args = sys.argv
        print(args)
    
        if os.path.isdir(args[1]):
            file_list = get_file_name(args[1], ".vtt")
            for file in file_list:
                vtt2srt(file)
    
        elif os.path.isfile(args[1]):
            vtt2srt(args[1])
        else:
            print("arg[0] should be file name or dir")

     

  • 3 定義獲取文件名稱函數get_file_name

  • def get_file_name(dir, file_extension):
        f_list = os.listdir(dir)
    
        result_list = []
    
        for file_name in f_list:
            if os.path.splitext(file_name)[1] == file_extension:
                result_list.append(os.path.join(dir, file_name))
        return result_list

     

  • 4 定義轉換邏輯

  • def vtt2srt(file_name):
        content = open(file_name, "r", encoding="utf-8").read()
        # 刪除WEBVTT行
        
        content = re.sub("WEBVTT\n\n",'',content)
        # 替換“.”為“,”
        content = re.sub("(\d{2}:\d{2}:\d{2}).(\d{3})", lambda m: m.group(1) + ',' + m.group(2), content)
    
        output_file = os.path.splitext(file_name)[0] + '.srt'
        open(output_file, "w", encoding="utf-8").write(content)
    
    def srt2vtt(file_name):
        content = open(file_name, "r", encoding="utf-8").read()
        # 添加WEBVTT行
        
        content = "WEBVTT\n\n" + content
        # 替換“,”為“.”
        content = re.sub("(\d{2}:\d{2}:\d{2}),(\d{3})", lambda m: m.group(1) + '.' + m.group(2), content)
    
        output_file = os.path.splitext(file_name)[0] + '.vtt'
        open(output_file, "w", encoding="utf-8").write(content)

     

  • 5 完整代碼

  • import os
    import sys
    import re
    
    
    def get_file_name(dir, file_extension):
        f_list = os.listdir(dir)
    
        result_list = []
    
        for file_name in f_list:
            if os.path.splitext(file_name)[1] == file_extension:
                result_list.append(os.path.join(dir, file_name))
        return result_list
    
    
    def vtt2srt(file_name):
        content = open(file_name, "r", encoding="utf-8").read()
        # 刪除WEBVTT行
        
        content = re.sub("WEBVTT\n\n",'',content)
        # 替換“.”為“,”
        content = re.sub("(\d{2}:\d{2}:\d{2}).(\d{3})", lambda m: m.group(1) + ',' + m.group(2), content)
    
        output_file = os.path.splitext(file_name)[0] + '.srt'
        open(output_file, "w", encoding="utf-8").write(content)
    
    def srt2vtt(file_name):
        content = open(file_name, "r", encoding="utf-8").read()
        # 添加WEBVTT行
        
        content = "WEBVTT\n\n" + content
        # 替換“,”為“.”
        content = re.sub("(\d{2}:\d{2}:\d{2}),(\d{3})", lambda m: m.group(1) + '.' + m.group(2), content)
    
        output_file = os.path.splitext(file_name)[0] + '.vtt'
        open(output_file, "w", encoding="utf-8").write(content)
    
        
    if __name__ == '__main__':
        args = sys.argv
    
        if os.path.isdir(args[1]):
            file_list = get_file_name(args[1], ".vtt")
            for file in file_list:
                vtt2srt(file)
    
        elif os.path.isfile(args[1]):
            vtt2srt(args[1])
            print('done')
        else:
            print("arg[0] should be file name or dir")
    View Code

    注意:

    • 1 為避免路徑錯誤,請使用文件夾的絕對路徑

    • 代碼基於python3.x

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM