【python】M3U8下載器腳本
腳本目標:
1. 輸入M3U8文件的鏈接,得到視頻
2.使用異步操作,這樣可以快很多,不加鎖,因為懶得寫,而且影響不大
已知條件:
1.m3u8文件其實就是一個記錄了ts文件下載鏈接的工具文件,每個ts文件就是視頻的一部分,把所有ts文件下載下來,合並就可以得到完整的視頻
腳本思路:
1.創建一個文件夾,用來存放下載好的m3u8文件和下載好的ts文件
2.下載並打開m3u8文件,根據m3u8文件下載ts文件,這邊設計了兩種情況,a.ts的下載鏈接是完整的 b.ts的下載鏈接是需要拼接的
3.根據m3u8文件自動校驗文件是否下載完整
4.由於很多時候ts文件的命名是沒有規律的,所以再次打開m3u8文件,根據里面的順序,以追加的形式寫入到一個新的ts文件里
代碼實現:
先創建好文件夾,這邊使用了相對路徑
def init(): if os.path.exists("./temp_data"): return
else: os.mkdir("./temp_data")
得到m3u8下載鏈接,獲得m3u8文件名,這邊假設是https://xxxxxxx126.net/nos/hls/2019/03/13/1214418271_9xxxxxxx32465d1f4c8_sd.m3u8,那么就設置“1214418271_9xxxxxxx32465d1f4c8_sd.m3u8”為文件名
url =str(input("輸入m3u8文件url >")) name = url.rsplit("/")[-1]
下載m3u8文件
def m3u8_files_download(url,name): #下載m3u8文件
resp = requests.get(url) with open(f"temp_data/{name}.txt",mode="wb") as f: f.write(resp.content) resp.close()
給出第一個ts的下載鏈接,用戶自己判斷一下是需要拼接的,還是無需拼接的完整url
def get_type(name): with open(f"temp_data/{name}.txt","r") as f: for line in f: if line.startswith("#"): continue
else: print("內容為:",line) print("選擇模式: 1.直接下載型 2.拼接型") choice = input(">") return str(choice)
寫一個啟動器,根據不同的選擇,創建不同的任務,創建的任務為異步任務
async def starter(choice,name): tasks=[] async with aiohttp.ClientSession() as session: if choice =="1": with open(f"/temp_data/{name}.txt","r") as f: for line in f: if line.startswith("#"): continue
else: download_url = line.strip() line = line.split("/") file_name = str(line[-1]).strip() # 得下載的ts文件名
task = download_ts(file_name,download_url,session) tasks.append(task) print("文件下載中.....") await asyncio.wait(tasks) # 等待任務執行結束
print("文件下載完成") if choice=="2": url = str(input("輸入拼接的url>")) with open(f"temp_data/{name}.txt","r") as f: for line in f: if line.startswith("#"): continue
else: line = line.strip() file_name = line # 得下載的ts文件名
download_url = url+line task = download_ts(file_name,download_url,session) tasks.append(task) print("文件下載中.....") await asyncio.wait(tasks) # 等待任務執行結束
print("文件下載完成")
下載ts文件,用aiohttp來代理requests
async def aio_download_ts(download_url,line_name,session): async with session.get(download_url,headers=header) as resp: async with aiofiles.open(f"temp_data/{line_name}",mode="wb") as f: await f.write(await resp.content.read()) print(f"文件{line_name}下載完成!!")
校驗文件的完整性:依據m3u8文件,判斷文件是否存在
def verification(name): files=[] with open(f"temp_data/{name}.txt","r") as f: for line in f: if line.startswith("#"): continue
else: line=line.strip() if os.path.exists(f"temp_data/{line}"): continue
else: files.append(line) print("以下文件缺失,請手動查看:",files)
合並文件,實現的方式時創建一個ts文件,依據m3u8文件里的文件順序,依次將二進制文件寫入到新的ts文件里
def merge_ts(file_name): new_name = str(input("輸入合並后的文件名>")) with open(f"./{new_name}.ts", "ab+") as f: with open(f"temp_data/{file_name}.txt","r") as f2: for line in f2: if line.startswith("#"): continue
else: line = line.strip().split("/")[-1].strip() ts_name = line try: with open(f"temp_data/{ts_name}","rb") as f3: f.write(f3.read()) except: continue
最后再寫一個主函數,執行這一切
def main(): init() url =str(input("輸入m3u8文件url >")) name = url.rsplit("/")[-1] m3u8_files_download(url,name)#下載m3u8文件
choice=get_type(name) asyncio.run(starter(choice,name)) print("校驗文件完整性") verification(name) print("是否合並文件? Y/N") if str(input(">"))=="Y": merge_ts(name) else: print("結束")
最終功能代碼
import aiohttp import aiofiles import asyncio import requests import os header = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.93 Safari/537.36"} def merge_ts(file_name): new_name = str(input("輸入合並后的文件名>")) with open(f"./{new_name}.ts", "ab+") as f: with open(f"temp_data/{file_name}.txt","r") as f2: for line in f2: if line.startswith("#"): continue
else: line = line.strip().split("/")[-1].strip() ts_name = line try: with open(f"temp_data/{ts_name}","rb") as f3: f.write(f3.read()) except: continue async def aio_download_ts(download_url,line_name,session): async with session.get(download_url,headers=header) as resp: async with aiofiles.open(f"temp_data/{line_name}",mode="wb") as f: await f.write(await resp.content.read()) print(f"文件{line_name}下載完成!!") def m3u8_files_download(url,name): #下載m3u8文件
resp = requests.get(url) with open(f"temp_data/{name}.txt",mode="wb") as f: f.write(resp.content) resp.close() def get_type(name): with open(f"temp_data/{name}.txt","r") as f: for line in f: if line.startswith("#"): continue
else: print("內容為:",line) print("選擇模式: 1.直接下載型 2.拼接型") choice = input(">") return str(choice) def init(): if os.path.exists("./temp_data"): return
else: os.mkdir("./temp_data") def verification(name): files=[] with open(f"temp_data/{name}.txt","r") as f: for line in f: if line.startswith("#"): continue
else: line=line.strip() if os.path.exists(f"temp_data/{line}"): continue
else: files.append(line) print("以下文件缺失,請手動查看:",files) async def download_ts(file_name,download_url,session): async with session.get(download_url,headers=header) as resp: async with aiofiles.open(f"temp_data/{file_name}",mode="wb") as f: await f.write(await resp.content.read()) async def starter(choice,name): tasks=[] async with aiohttp.ClientSession() as session: if choice =="1": with open(f"/temp_data/{name}.txt","r") as f: for line in f: if line.startswith("#"): continue
else: download_url = line.strip() line = line.split("/") file_name = str(line[-1]).strip() # 得下載的ts文件名
task = download_ts(file_name,download_url,session) tasks.append(task) print("文件下載中.....") await asyncio.wait(tasks) # 等待任務執行結束
print("文件下載完成") if choice=="2": url = str(input("輸入拼接的url>")) with open(f"temp_data/{name}.txt","r") as f: for line in f: if line.startswith("#"): continue
else: line = line.strip() file_name = line # 得下載的ts文件名
download_url = url+line task = download_ts(file_name,download_url,session) tasks.append(task) print("文件下載中.....") await asyncio.wait(tasks) # 等待人物執行結束
print("文件下載完成") def main(): init() url =str(input("輸入m3u8文件url >")) name = url.rsplit("/")[-1] m3u8_files_download(url,name)#下載m3u8文件
choice=get_type(name) asyncio.run(starter(choice,name)) print("校驗文件完整性") verification(name) print("是否合並文件? Y/N") if str(input(">"))=="Y": merge_ts(name) else: print("結束") main()
使用自欺欺人術,直接把ts文件后綴改成MP4,看着舒服點。
實現效果
視頻打開能正常觀看,腳本完成
后記:關於腳本的使用
理論上把aiohttp,aiofiles,asyncio三個庫安裝好,復制粘貼應該就可以直接用,也可以把一些需要手工提供的量,在腳本中寫死,以在不同的爬蟲中使用。
ENDING..........