爬蟲實戰-網易雲音樂

本文轉載自查看原文 2019-03-12 10:51 3373 爬蟲

經分析，網易雲音樂分為以下三類：免費音樂、會員下載，付費收聽。

前兩類音樂包含了絕大多數音樂，付費收聽僅僅是極少數。

本篇爬蟲目的--> 實現需要會員下載的音樂能夠免費下載

核心：網易雲提供了一個音樂下載接口

http://music.163.com/song/media/outer/url?id=音樂ID.MP3
將音樂ID替換為相應的音樂ID就行，然后請求該鏈接獲得MP3文件

方式一（適用小白）

在在網易雲客戶端找到復制鏈接

然后將復制到的鏈接粘貼出來  類似於這種 https://music.163.com/song?id=1345848098&userid=315893058

1345848098 即為音樂ID

然后對接口進行替換得到 下載鏈接

綠色-陳雪凝（會員下載音樂）http://music.163.com/song/media/outer/url?id=1345848098.MP3

方式二（爬蟲）:
　　該方式不是對方式一的代碼實現，而是實現歌單下載
第一步，找到歌單鏈接（在歌單的分享里面的 復制鏈接）

鏈接類似於 https://music.163.com/playlist?id=2520126575&userid=315893058

第二步，在瀏覽器訪問該鏈接

F12 打開調試模式

點擊箭頭所指圖標，然后點擊任意音樂名字

可以看出下面的html源碼中出現了藍色標志；

該代碼區域為該音樂的標簽代碼，因為這是一個列表，所以所有音樂都是這個格式。

爬取核心，直接提取該頁面所有a標簽，並進行判斷里面是否存在b標簽和a標簽的 href 是否以 /song?id=開頭，因為b標簽里面含有音樂名，有的a標簽和音樂標簽類似，但是沒有b標簽，也不是我們要的音樂，所以要進行排除

代碼:

傳入列表鏈接解析列表，獲取歌單所有音樂 ID，並生成下載鏈接

    def ParsingPlayList(self, url):
        response=requests.get(url=url, headers=CloudMusic.header)
        soup=BeautifulSoup(response.text, "html.parser")
        alist=soup.select("a")
        Songs=[]
        for music in alist:
            if music.has_attr("href"):
                if str(music.attrs["href"]).startswith("/song?id="):
                    id=str(music.attrs["href"]).replace("/song?id=", "")
                    try:
                        Songs.append({
                            "id": id,
                            "url": "http://music.163.com/song/media/outer/url?id=" + id + ".mp3",
                            "name": music.text
                        })
                    except:
                        pass
        return Songs

import requests
from bs4 import BeautifulSoup
import os
STATUS_OK,STATUS_ERROR,STATUS_EXITS=1,-1,0
class CloudMusic:
    header = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 SE 2.X MetaSr 1.0'
    }
    def Down(self, down_url, filePath, NowIndex, TotalCount, FileDir):
        if not os.path.isdir(FileDir):  os.makedirs(FileDir)
        if os.path.isfile(FileDir + "/" + filePath + ".mp3"):
            print(filePath+"，本地已存在")
            return STATUS_EXITS
        response = requests.get(down_url, headers=CloudMusic.header, allow_redirects=False)
        try:
            r = requests.get(response.headers['Location'], stream=True)
            size=int(r.headers['content-length'])
            print('\033[0;31m'+str(NowIndex) + "/" + str(TotalCount) + "  當前下載-" + filePath + "  文件大小:" + str(size) + "字節"+"\033[0m")
            CurTotal=0
            with open(FileDir + "/" + filePath + ".mp3", "wb") as f:
                for chunk in r.iter_content(chunk_size=512*1024):
                    if chunk:
                        f.write(chunk)
                        CurTotal += len(chunk)
                        print("\r" + filePath + "--下載進度:" + '%3s' % (str(CurTotal*100//size)) + "%", end='')
                print()
                r.close()
            return STATUS_OK
        except Exception as e:
            print(filePath + " 下載出錯!" + " 錯誤信息" + str(e.args))
            if os.path.isfile(FileDir + "/" + filePath + ".mp3"):  os.remove(FileDir + "/" + filePath + ".mp3")
            return STATUS_ERROR

    def ParsingPlayList(self, url):
        response=requests.get(url=url, headers=CloudMusic.header)
        soup=BeautifulSoup(response.text, "html.parser")
        alist=soup.select("a")
        Songs=[]
        for music in alist:
            if music.has_attr("href"):
                if str(music.attrs["href"]).startswith("/song?id="):
                    id=str(music.attrs["href"]).replace("/song?id=", "")
                    try:
                        Songs.append({
                            "id": id,
                            "url": "http://music.163.com/song/media/outer/url?id=" + id + ".mp3",
                            "name": music.text
                        })
                    except:
                        pass
        return Songs

    def Start(self, MusicList, Dd):
        total=len(MusicList)
        CurIndex=OkCount=FalseCount=ExitCount=0
        print("歌單共計:" + str(len(MusicList)) + "首")
        for data in MusicList:
            CurIndex+=1
            status=self.Down(data["url"],data["name"].replace("/",""),CurIndex,total,Dd)
            if status==1:   OkCount+=1
            elif status==0: ExitCount+=1
            else:           FalseCount+=1
        print("下載成功"+str(OkCount)+"首"+"\n下載失敗"+str(FalseCount)+"首"+"\n本地已存在"+str(ExitCount)+"首")

if __name__=="__main__":
    CrawlerClient= CloudMusic()
    # CrawlerClient.Start(CrawlerClient.ParsingPlayList("https://music.163.com/playlist?id=1992662269&userid=315893058"), "廣場舞")
    # CrawlerClient.Start(CrawlerClient.ParsingPlayList("https://music.163.com/playlist?id=2584781662"),"治愈")
    CrawlerClient.Start(CrawlerClient.ParsingPlayList("https://music.163.com/playlist?id=2243470689&userid=315893058"),"mp3")

源碼

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 【Python3爬蟲】網易雲音樂爬蟲關於網易雲音樂爬蟲的api接口？ Python爬蟲實戰，DecryptLogin模塊，Python模擬登錄實現網易雲音樂自動簽到 python爬蟲實例--網易雲音樂排行榜爬蟲【Python3爬蟲】網易雲音樂歌單下載 Python爬蟲可視化之網易雲音樂歌單 Python網絡爬蟲（1）：網易雲音樂歌單 python3爬蟲-下載網易雲音樂，評論 python爬蟲:了解JS加密爬取網易雲音樂 python3爬蟲：下載網易雲音樂排行榜