Python之旅4:酷狗音樂初次爬蟲音樂播放連接,儲存在MySQL數據庫


導言:初次爬蟲,若有不足之處,多多指正,內容借鑒一位大神爬蟲經歷,我這邊錦上添花,添加獲取音樂播放路徑和連接mysql數據庫等相關內容,

涉及軟件: Navicat for MySQL破解版 以及 postman

爬蟲數據有: 歌詞,歌曲,歌手,播放路徑,封面圖顯示,歌曲時長,歌詞次數大小等等

爬蟲涉及模塊:

import time

import pymysql
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
import json
View Code

爬蟲思路以及問題:

1·hash以及mid加密問題

2·歌曲播放路徑以及歌曲詳情請求路徑巧妙繞過方案

3·獲到數據編碼問題

4·歌曲頁面獲取localstorage以及cookie值問題

5·連接數據庫儲存mysql問題

解決問題過程以其中一條歌曲路徑為例:

URL:https://wwwapi.kugou.com/yy/index.php?r=play/getdata&callback=jQuery19108013258872165683_1631704461109&hash=BC4E172CF13BB79303203A48246D84E1&dfid=2C8TCD3wtvFq3A2P4h4Slbtf&appid=1014&mid=ee9a0573ca7b9cda6b916c684b10b6da&platid=4&album_id=38915273&_=1631704461111

至於這條URL怎么來的,暫時不管,先分析這條get請求

·涉及參數:

1·hash:BC4E172CF13BB79303203A48246D84E1

2·dfid:2C8TCD3wtvFq3A2P4h4Slbtf

3·appid:1014

4·mid:ee9a0573ca7b9cda6b916c684b10b6da

5·platid:4

6·_:1631085855865

7·callback:jQuery19108013258872165683_1631704461109

然后再postman打開這請求:顯示結果如下

 

轉碼JSON:

jQuery19108013258872165683_1631704461109({
    "status": 1,
    "err_code": 0,
    "data": {
        "hash": "BC4E172CF13BB79303203A48246D84E1",
        "timelength": 168000,
        "filesize": 2701932,
        "audio_name": "傅夢彤、安蘇羽 - 潮汐 (Natural)",
        "have_album": 1,
        "album_name": "潮汐 (Natural)",
        "album_id": "38915273",
        "img": "http://imge.kugou.com/stdmusic/20201204/20201204164503970613.jpg",
        "have_mv": 1,
        "video_id": "4709291",
        "author_name": "傅夢彤、安蘇羽",
        "song_name": "潮汐 (Natural)",
        "lyrics": "[id:$00000000]\r\n[ar:傅夢彤、安蘇羽]\r\n[ti:潮汐 (Natural)]\r\n[by:]\r\n[hash:bc4e172cf13bb79303203a48246d84e1]\r\n[al:]\r\n[sign:]\r\n[qq:]\r\n[total:168829]\r\n[offset:0]\r\n[00:00.08]傅夢彤、安蘇羽 - 潮汐 (Natural)\r\n[00:00.87]作詞:安蘇羽、舒心\r\n[00:01.12]混音:謝驍\r\n[00:21.12]當海面迎來洶涌的潮汐\r\n[00:23.55]我奔跑尋找昔日的足跡\r\n[00:26.18]夕陽下倒影迷人的美麗\r\n[00:28.71]可我卻丟失故事和你\r\n[00:31.34]你說過向往大海的神秘\r\n[00:33.92]也憧憬我們遺失的過去\r\n[00:36.50]分享給大海秘密\r\n[00:39.74]藍色的海底\r\n[00:42.27]遠山的風景\r\n[00:45.16]我們的距離遙不可及\r\n[00:50.02]退守的愛情\r\n[00:52.70]還剩下回憶\r\n[00:55.03]瘋狂地尋覓你的身影\r\n[01:00.69]殘月憂郁\r\n[01:03.07]星夜靜謐\r\n[01:05.60]潮落嘆息\r\n[01:11.04]聆聽山語\r\n[01:13.42]回盪不清\r\n[01:15.91]若即若離\r\n[01:23.05]當海面迎來洶涌的潮汐\r\n[01:25.53]我奔跑尋找昔日的足跡\r\n[01:28.11]夕陽下倒影迷人的美麗\r\n[01:30.65]可我卻丟失故事和你\r\n[01:33.28]你說過向往大海的神秘\r\n[01:35.86]也憧憬我們遺失的過去\r\n[01:38.40]分享給大海秘密\r\n[01:41.59]藍色的海底\r\n[01:44.27]遠山的風景\r\n[01:46.90]我們的距離遙不可及\r\n[01:52.06]退守的愛情\r\n[01:54.59]還剩下回憶\r\n[01:57.12]瘋狂地尋覓你的身影\r\n[02:02.59]殘月憂郁\r\n[02:04.92]星夜靜謐\r\n[02:07.55]潮落嘆息\r\n[02:12.97]聆聽山語\r\n[02:15.45]回盪不清\r\n[02:17.87]若即若離\r\n[02:23.24]殘月憂郁\r\n[02:25.57]星夜靜謐\r\n[02:28.15]潮落嘆息\r\n[02:33.65]聆聽山語\r\n[02:36.07]回盪不清\r\n[02:38.50]若即若離\r\n",
        "author_id": "968893",
        "privilege": 8,
        "privilege2": "1000",
        "play_url": "https://webfs.ali.kugou.com/202109151914/04b52e1a2976cec776bee89f7690b6ae/G226/M05/18/14/gocBAF87mjyAfkS4ACk6bFP5o1Q758.mp3",
        "authors": [
            {
                "author_id": "968893",
                "author_name": "傅夢彤",
                "is_publish": "1",
                "sizable_avatar": "http://singerimg.kugou.com/uploadpic/softhead/{size}/20210610/20210610031307109445.jpg",
                "avatar": "http://singerimg.kugou.com/uploadpic/softhead/400/20210610/20210610031307109445.jpg"
            },
            {
                "author_id": "87264",
                "author_name": "安蘇羽",
                "is_publish": "1",
                "sizable_avatar": "http://singerimg.kugou.com/uploadpic/softhead/{size}/20190321/20190321201004866434.jpg",
                "avatar": "http://singerimg.kugou.com/uploadpic/softhead/400/20190321/20190321201004866434.jpg"
            }
        ],
        "is_free_part": 0,
        "bitrate": 128,
        "recommend_album_id": "38915273",
        "audio_id": "80133277",
        "has_privilege": true,
        "play_backup_url": "https://webfs.cloud.kugou.com/202109151914/b75e58fc8b2b687b5a4a65df5d319aa1/G226/M05/18/14/gocBAF87mjyAfkS4ACk6bFP5o1Q758.mp3"
    }
});
View Code

 通過postman對上調URL刪除某些參數結果:https://wwwapi.kugou.com/yy/index.php?r=play/getdata&hash=BC4E172CF13BB79303203A48246D84E1&appid=1014&mid=ee9a0573ca7b9cda6b916c684b10b6da&album_id=38915273

這條請求數據結果和上述結果是一樣的,有意思了,經過排除,我們通過以下幾個參數照樣能夠獲取到我們想要的數據

4個必需參數:

1·r=play/getdata

2·hash=BC4E172CF13BB79303203A48246D84E1

3·mid=ee9a0573ca7b9cda6b916c684b10b6da

4·album_id=38915273

接下來就是推測這些參數的大致作用,方便從網頁爬取,hash和mid,加密的意思!我們爬取的一條一條音樂數據,一個網頁(我們爬取的是酷狗音樂排行榜播放詳情頁的數據)只有一條音樂數據,應該hash和mid對應的是單獨一條音樂數據,參數r是固值

剩下的就是 album_id這一個參數啦

接下來就是怎么找參數啦

爬蟲目標網址:https://www.kugou.com/yy/rank/home/

目標名稱:酷狗音樂飆升榜全部音樂

隨便點開一個音樂,我這邊點開的是第一個:https://www.kugou.com/song/1lnk4m0b.html#hash=C2BF05871AC21B3928F63AB8EE3890EF&album_id=48707908

按F12刷新,點擊media欄目會有一條音頻數據:https://webfs.ali.kugou.com/202109161024/9fe5a6a564d487a4701d1454223ea008/KGTX/CLTX001/c2bf05871ac21b3928f63ab8ee3890ef.mp3

截圖如下:

瞅到沒,有個壞家伙cookie朝着你笑呢!kg_mid,我們再找找其他參數

想着cookie有數據,我順便看看本地儲存有木有,好家伙,讓我直接發現了這個,趕緊上圖:

 

 

 這下好了,參數全到手了

接下來就是通過Python打開瀏覽器頁面獲取本地儲存數據,然后請求獲到數據保存到mysql,先獲取20條試試!

tips:localStorage數據獲取需要用到chromedriver.exe驅動谷歌瀏覽器,版本與自己的谷歌瀏覽器版本相近網上有下載,這里不贅述了

 

我就直接上源碼了,涉及連接mysql數據庫感興趣的話點這里

import time

import pymysql
import requests
from bs4 import BeautifulSoup
from selenium import webdriver
import json

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:58.0) Gecko/20100101 Firefox/58.0'}

# 固值 callback 可選 _ 可選
# dfid = '2C8TCD3wtvFq3A2P4h4Slbtf'
appid = '1014'
# mid = 'ee9a0573ca7b9cda6b916c684b10b6da'
platid = '4'
r = 'play/getdata'


def top(conn, cursor,url):
    html = requests.get(url, headers=headers)
    soup = BeautifulSoup(html.text, 'lxml')
    No = soup.select('.pc_temp_num')
    titles = soup.select('.pc_temp_songname')
    href = soup.select('.pc_temp_songname')
    time = soup.select('.pc_temp_time')
    for No, titles, time, href in zip(No, titles, time, href):
        data = {
            'NO': No.get_text().strip(),
            'titles': titles.get_text(),
            'time': time.get_text().strip(),
            'href': href.get('href')}
        print(data)
        GetCurrentPageLocalStorage(conn, cursor, href.get('href'))


# 獲取當前頁面localStorage儲存的 jStorage 和 kg_mid的值 cookie值 中的dfid值, mid值
def GetCurrentPageLocalStorage(conn, cursor, href):
    browser = webdriver.Chrome(
        executable_path='C:\\Users\Administrator\AppData\Local\Programs\Python\Python39\chromedriver.exe')
    browser.get(href)
    kg_mid = browser.execute_script("return localStorage.getItem('kg_mid')")
    jStorage = browser.execute_script("return localStorage.getItem('jStorage')")
    cookies = browser.get_cookies()
    # print('kg_mid:', kg_mid, "jStorage:", jStorage)
    OjectJson = json.loads(jStorage)["k_play_list"]
    formatPlay = json.loads(OjectJson)
    # print('formatPlay:', formatPlay)

    mid = kg_mid

    hash = formatPlay[0]["hash"]
    # print('hash:', hash)
    album_id = formatPlay[0]["album_id"]

    html = requests.get(url='https://wwwapi.kugou.com/yy/index.php?r=play/getdata&hash=' + str(hash) + '&mid=' + str(mid) + '&platid=' + str(platid) + '&album_id=' + str(album_id))
    # print('kg_mid:', kg_mid, "jStorage:", jStorage)
    # print('cookies:', cookies)
    # print('soup:', html.json())
    Deposit(conn, cursor, html.json())

# 創建數據表單
def createTable(con, cs):
    cs.execute("create table if not exists kugouMusic (audio_name varchar(1000), img varchar(1000), song_name varchar(100) primary key,\
     timelength varchar(100), filesize varchar(100), language varchar(100), \
     video_id varchar(100), \
     author_name varchar(100), album_id varchar(100), play_backup_url varchar(500), lyrics varchar(2000),play_url varchar(1000))")
    # 提交事務:
    con.commit()

# 酷狗音樂存入數據庫 audio_name img song_name timelength filesize language video_id author_name album_id play_backup_url lyrics play_url
def Deposit(con, cs, data):
    music_data = data['data']
    description = music_data
    lyrics = description['lyrics']
    img = description['img']
    song_name = description['song_name']
    timelength = description['timelength']
    filesize = description['filesize']
    language = description['privilege']
    video_id = description['video_id']
    album_id = description['album_id']
    play_backup_url = description['play_backup_url']
    play_url = description['play_url']
    author_name = description['author_name']
    audio_name = description['audio_name']
    try:
        cs.execute(
            'insert into kugouMusic values (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)', \
            (audio_name, img, song_name, timelength, filesize, language, video_id, author_name, album_id, play_backup_url, lyrics, play_url))
    except:
        pass
    finally:
        con.commit()


if __name__ == '__main__':
    urls = {'http://www.kugou.com/yy/rank/home/{}-8888.html'.format(str(i)) for i in range(1, 2)}
    # 連接MySQL數據庫
    conn = pymysql.connect(host='127.0.0.1', user='root', password='123', db='music', charset='utf8')
    cursor = conn.cursor()
    createTable(conn, cursor)
    for url in urls:
        time.sleep(5)
        top(conn, cursor, url)

    # 關閉數據庫
    cursor.close()
    conn.close()
    print('所有頁面地址爬取完畢!')
View Code

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM