python 采集m3u8視頻

本文轉載自查看原文 2021-09-27 18:29 215

初學python 很多高級的東西都還不知道，多以代碼很臃腫

其實下視頻最簡單的方法就是用手機瀏覽器，像誇克啊之類的，自帶的播放器打開視頻播放網站，然后點擊瀏覽器提供的下載功能，他其實也是通過解析網站的m3u8下載

但是問題在於公司的無線網把我們的外網都牆了，流量又限速，又是個影迷，就想着用電腦下好視頻，傳到手機上看。走起

* 通過我長時間的觀察，一般視頻網站，m3u8鏈接都是放在播放器的script中，所以使用 BeautifulSoup 在頁面源代碼中獲取到包含 .m3u8 的script並提取出鏈接即可

        content = requests.get(htmlUrl,headers=headers).text
        bsObj = BeautifulSoup(content,"html.parser") 
        index = 0
        for scriptItem in bsObj.findAll("script"):
            index += 1
            if '.m3u8' in str(scriptItem):
                m3u8Start = str(scriptItem).find("\"url\":\"")+7
                m3u8End = str(scriptItem).find(".m3u8")+5
                m3u8Url = str(scriptItem)[m3u8Start:m3u8End].replace('\\','')
                global m3u8Url_before
                m3u8Url_before="https://"  + m3u8Url.split('/')[2]     #獲取到m3u8文件，並保存m3u8的域名，防止ts鏈接為相對路徑時，進行拼接

* 這里拿到的m3u8可能時最終的鏈接也可能時源鏈接，區別在於，最終鏈接里是包含 /hls/ 的，所以需要判斷拿到的m3u8是否包含hls

# 鏈接是源鏈接時，需要加載出帶有“hls”的鏈接
      if 'hls' not in m3u8Url:
           content = requests.get(m3u8Url,headers=headers).text
           count = 0
           nList = []    #存儲出現換行位置的數組，用於后面取后兩個換行位置之間的為m3u8鏈接
           for item in list(content):
               count += 1
               if item == '\n':
                    nList.append(count)
               m3u8Url_hls = str(content)[nList[-2]:nList[-1]]
               # 判斷此m3u8鏈接是否為絕對路徑
               if 'http' not in m3u8Url_hls:
                   m3u8Url_hls = m3u8Url_before + m3u8Url_hls
               else:
                   m3u8Url_before = ''

　　else:

if index == len(list(bsObj.findAll("script"))):

print("在頁面Script中未找到相關m3u8鏈接。。。")

　　* 拿到m3u8鏈接后，開始獲取所有的ts鏈接，並存儲到本地的txt文件中，后期判斷，如果有這個文件，就跳過以上兩步

def tsList():    #存儲ts鏈接到本地txt文件
    with open('E:/python/xxx/' + name + '.txt','r') as f:
        if '.ts' in f.read():
            print('ts視頻鏈接均已存儲，無需重復請求')
        else:
            print('開始獲取並存儲ts鏈接')
            with open('E:/python/xxx/' + name + '.txt','r') as f:
                m3u8Url = f.readlines()[0].strip()
            # requests得到m3u8文件內容
            content = requests.get(m3u8Url,headers=headers)
            print(content.text)
　　　　　　　#獲取ts視頻的加密鏈接，如果有的話，請求該鏈接，獲取解密key並存儲到相同txt中
            jiami=re.findall('#EXT-X-KEY:(.*)\n',content.text)
            if len(jiami)>0:
                key=str(re.findall('URI="(.*)"',jiami[0]))[2:-2]
                if 'http' not in key:
                    m3u8Start = m3u8Url.find("\"url\":\"")+7
                    m3u8End = m3u8Url.find(".m3u8")+5
                    m3u8Url = m3u8Url[m3u8Start:m3u8End].replace('\\','')
                    m3u8Url_before="https://"  + m3u8Url.split('/')[2]
                else:
                    m3u8Url_before = ''
                keycontent= requests.get(m3u8Url_before + key,headers).text
                with open('E:/python/xxx/' + name + '.txt','a') as f:
                    f.write(keycontent + '\n')
            else:
　　　　　　　#如果沒有加密的話，直接使用00000000000進行占位，用於下載時的判斷
                with open('E:/python/xxx/' + name + '.txt','a') as f:
                    f.write('000000000000')
　　　　　　　#開始請求m3u8鏈接，並通過鏈接的正則匹配出所有的ts文件鏈接
            if(content.status_code == 200):
                pattern = re.compile(r'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+') # 匹配每一個ts鏈接
                content = content.text.split(",")
                index = 0
                for item in content:
                    index += 1
                    url = str(pattern.findall(item))[2:-2]
                    with open('E:/python/xxx/' + name + '.txt','a') as f:
                        f.write(url + '\n')
                            
        Download()

　　* 以上，前期准備已完成，順利的話，xxx.txt長這樣

未完待續

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 爬蟲 | Python下載m3u8視頻 python 玩轉m3u8視頻下載m3u8視頻 m3u8視頻流下載方案 videojs 視頻播放m3u8 web播放m3u8視頻 Java 下載 HLS (m3u8) 視頻 m3u8視頻下載方法 Python3 根據m3u8下載視頻，批量下載ts文件並且合並 Python3——根據m3u8下載視頻（下）之requests