一、概述

爬取步驟

第一步：獲取視頻所在的網頁

第二步：F12中找到視頻真正所在的鏈接

第三步：獲取鏈接並轉換成機械語言

第四部：保存

二、分析視頻鏈接

獲取視頻所在的網頁

以酷6網為例，隨便點擊一個視頻播放鏈接，比如：https://www.ku6.com/video/detail?id=udfY7DjsSXbg8ghbDnhUwNTinOY

F12中找到視頻真正所在的鏈接

按F12打開工具欄，network-->Media，這里會顯示媒體文件

刷新頁面，就能看到真正的視頻加載鏈接

獲取鏈接並轉換成機械語言

查看網頁代碼，找到js代碼。這里面就可以看到真正的視頻鏈接地址

那么通過正則匹配，就可以得到視頻地址了。

直接打開這個視頻地址，網頁可以直接播放。

https://rbv01.ku6.com/wifi/o_1ej9q59hk1rpe7hm9os1v9v1btna

保存

可以直接播放的話，通過requests模塊，就可以下載二進制文件了。

三、代碼實現

完整代碼

# ！/usr/bin/python3
# -*- coding: utf-8 -*-
import re
from lxml import etree
import requests
import time
from tqdm import tqdm
import os
from urllib.request import urlopen

def download_from_url(url, dst):
    """
    @param: url to download file
    @param: dst place to put the file
    :return: bool
    """
    # 獲取文件長度
    try:
        file_size = int(urlopen(url).info().get('Content-Length', -1))
    except Exception as e:
        print(e)
        print("錯誤，訪問url: %s 異常" % url)
        return False

    # print("file_size",file_size)
    # 判斷本地文件存在時
    if os.path.exists(dst):
        # 獲取文件大小
        first_byte = os.path.getsize(dst)
    else:
        # 初始大小為0
        first_byte = 0

    # 判斷大小一致，表示本地文件存在
    if first_byte >= file_size:
        print("文件已經存在,無需下載")
        return file_size


    header = {"Range": "bytes=%s-%s" % (first_byte, file_size)}

    pbar = tqdm(
        total=file_size, initial=first_byte,
        unit='B', unit_scale=True, desc=url.split('/')[-1])

    # 訪問url進行下載
    req = requests.get(url, headers=header, stream=True)
    try:
        with(open(dst, 'ab')) as f:
            for chunk in req.iter_content(chunk_size=1024):
                if chunk:
                    f.write(chunk)
                    pbar.update(1024)
    except Exception as e:
        print(e)
        return False

    pbar.close()
    return True

def DownloadFile(url, name):
    """
    下載文件
    :param url:
    :param name:
    :return:
    """
    try:
        resp = requests.get(url=url, stream=True)
        content_size = int(resp.headers['Content-Length']) / 1024
        with open(name, "wb") as f:
            print("package total size is:", content_size, 'k,start...')
            for data in tqdm(iterable=resp.iter_content(1024), total=content_size, unit='k', desc=name):
                f.write(data)

        print("%s 下載成功"%url)
        return True
    except Exception as e:
        print(e)
        print("%s 下載失敗" % url)
        return False

# 頭部
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Safari/537.36'
}

# 訪問頁面
response = requests.get('https://www.ku6.com/detail/371', headers=headers)
data = response.text

#構造了一個XPath解析對象並對HTML文本進行自動修正
html = etree.HTML(data)
# 獲取視頻播放鏈接
html_data = html.xpath('//div[@class="r_box"]/ul/li//a/@href')
# print("html_data", html_data, type(html_data))

# 遍歷url
for i in html_data:
    url = "https://www.ku6.com%s" % i
    print(url)
        
    # 訪問url
    response_1 = requests.get(url, headers=headers)
    data_1 = response_1.text
    # 正則匹配視頻地址
    video = re.findall('type: "video/mp4", src: "(.*?)"',data_1)
    video_1 = video[0]
    print("video_1", video_1)
    x = video_1.split('/')[-1]
    
    # 本地保存視頻文件名
    name = f'{x}.mp4'
    print("name", name)
    
    # 下載視頻
    download_from_url(video_1, name)
    
    # 這里只演示第一個視頻，直接break
    break

View Code

執行代碼，效果如下：

本文參考鏈接：
https://cloud.tencent.com/developer/article/1471132

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 python爬蟲（爬取視頻） python爬取VIP視頻 python爬蟲之—梨視頻爬取 python 爬取bilibili 視頻彈幕 Python 自動爬取B站視頻 python 爬取騰訊視頻評論 Python 爬取好看視頻 python爬蟲：爬取某網站視頻 python3爬蟲爬取動漫視頻