Python爬蟲：爬取美拍小姐姐視頻

本文轉載自查看原文 2018-09-19 15:26 1349

最近在寫一個應用，需要收集微博上一些熱門的視頻，像這些小視頻一般都來自秒拍，微拍，美拍和新浪視頻，而且沒有下載的選項，所以只能動腦想想辦法了。

第一步

分析網頁源碼。例如：http://video.weibo.com/show?fid=1034:0988e59a12e5178acb7f23adc3fe5e97，右鍵查看源碼，一般視頻都是mp4后綴，搜索發現沒有，但是有的直接就能看到了比如美拍的視頻。

第二步

抓包，分析請求和返回。這個也可以通過強大的chrome實現，還是上面的例子，右鍵->審查元素->NetWork，然后F5刷新網頁
這里寫圖片描述

發現有很多請求，只能一條一條的分析了，其實視頻格式就是那幾種mp4，flv，avi了，一下就能看到了，復制到瀏覽器中打開，果然就是我們想要的下載鏈接了。
這里寫圖片描述

第三步

分析下載鏈接和視頻鏈接的規律。即http://video.weibo.com/show?fid=1034:0988e59a12e5178acb7f23adc3fe5e97與xxx.mp4的關系。這個又需要分析網頁源碼了，其實可以注意上面那個以.m3u8后綴的鏈接,m3u8記錄了一個索引純文本文件，打開它時播放軟件並不是播放它，而是根據它的索引找到對應的音視頻文件的網絡地址進行在線播放,打開看，里面確實記錄着我們想要的下載鏈接。而且.m3u8后綴的鏈接就在網頁源碼中。
這里寫圖片描述

總結

經過前三步的分析，獲取視頻下載鏈接的思路就是先從網頁源碼中獲取.m3u8后綴的鏈接，下載該文件，從里面得到視頻下載鏈接，最后下載視頻就好了

源碼

#sinavideo.py
#coding=utf-8
import os
import re
import urllib2
import urllib 
from common import Common
class SinaVideo():

    URL_PIRFIX = "http://us.sinaimg.cn/"
    def getM3u8(self,html):
        reg = re.compile(r'list=([\s\S]*?)&fid')
        result = reg.findall(html)
        return result[0]


    def getName(self,url):
         return url.split('=')[1]

    def getSinavideoUrl(self,filepath):
        f = open(filepath,'r')
        lines = f.readlines()
        f.close()
        for line in lines:
            if line[0] !='#':
                return line

    def download(self,url,filepath):
        #獲取名稱
        name = self.getName(url)
        html = Common.getHtml(url)
        m3u8 = self.getM3u8(html)
        Common.download(urllib.unquote(m3u8),filepath,name + '.m3u8')
        url = self.URL_PIRFIX + self.getSinavideoUrl(filepath+name+'.m3u8')
        Common.download(url,filepath,name+'.mp4')

#common.py
#coding=utf-8
import urllib2
import os
import re


class Common():
    #  獲取網頁源碼
    @staticmethod
    def getHtml(url):
        html = urllib2.urlopen(url).read()
        print  "[+]獲取網頁源碼:"+url
        return html

    # 下載文件
    @staticmethod
    def download(url,filepath,filename):
        headers = {
            'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
            'Accept-Charset': 'UTF-8,*;q=0.5',
            'Accept-Encoding': 'gzip,deflate,sdch',
            'Accept-Language': 'en-US,en;q=0.8',
            'User-Agent': 'Mozilla/5.0 (Linux; Android 4.4.2; Nexus 4 Build/KOT49H) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.114 Mobile Safari/537.36'
        }
        request = urllib2.Request(url,headers = headers);
        response = urllib2.urlopen(request)
        path = filepath + filename
        with open(path,'wb') as output:
            while True:
                buffer = response.read(1024*256);
                if not buffer:
                    break
                # received += len(buffer)
                output.write(buffer)

        print "[+]下載文件成功:"+path

    @staticmethod
    def isExist(filepath):
        return os.path.exists(filepath)

    @staticmethod
    def createDir(filepath):
         os.makedirs(filepath,0777)

調用方式：

 url = "http://video.weibo.com/show?fid=1034:0988e59a12e5178acb7f23adc3fe5e97"
sinavideo = SinaVideo()         sinavideo.download(url,""/Users/cheng/Documents/PyScript/res/"")

結果

這里寫圖片描述

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 用python寫一個爬蟲——爬取性感小姐姐【python】爬蟲爬取美麗小姐姐圖片美女壁紙 Python爬蟲案例教學演示：爬取“絕對領域”二次元小姐姐圖片自從學會Python爬蟲后，爬視頻我只爬小姐姐！教你批量下載某短視頻網站視頻！爬蟲實戰——批量爬取小姐姐性感圖實戰例子丨爬蟲之爬取小姐姐圖片用Python爬取直播平台顏值區小姐姐視頻！注意身體，營養跟不上！直播跳舞的小姐姐穿的越來越涼快了？Python爬取顏值/舞蹈區小姐姐視頻（懂得都懂~完整代碼） 2021最新版Python爬取抖音小姐姐短視頻，無水印，超級詳細！（附視頻/源碼） python爬蟲（爬取視頻）