Python爬蟲——request實例：爬取網易雲音樂華語男歌手top10歌曲

本文轉載自查看原文 2018-03-04 19:36 3797 python

requests是python的一個HTTP客戶端庫，跟urllib，urllib2類似，但比那兩個要簡潔的多，至於request庫的用法，

推薦一篇不錯的博文：https://cuiqingcai.com/2556.html

話不多說，先說准備工作：

1，下載需要的庫：request，BeautifulSoup( 解析html和xml字符串),xlwt(將爬取到的數據存入Excel表中)

2，至於BeautifulSoup 解析html方法，推薦一篇博文：http://blog.csdn.net/u013372487/article/details/51734047

3，re庫，我們要用正則表達式來篩選爬取到的內容

好的，開始爬：

首先我們找到網易雲音樂華語男歌手頁面入口的URL：url = 'http://music.163.com/discover/artist/cat?id=1001'

把整個網頁爬取下來:　　 html= requests.get(url).text

　　　　　　　　　　　 soup = BeautifulSoup(html,'html.parser'

我們要找到進入top10歌手頁面的url,用瀏覽器的開發者工具，我們發現歌手的信息

都在<div class="u-cover u-cover-5">......</div>這個標簽里面，如圖：

於是，我們把top10歌手的信息篩選出來：

　　top_10 = soup.find_all('div',attrs = {'class':'u-cover u-cover-5'})

　　singers = []
　　for i in top_10:
　　　　singers.append(re.findall(r'.*?<a class="msk" href="(/artist\?id=\d+)" title="(.*?)的音樂"></a>.*?',str(i))[0])

獲取到歌手的信息后，依次進入歌手的界面，把他們的熱門歌曲爬取並寫入Excel表中，原理同上

附上完整代碼：

 1 import xlwt
 2 import requests
 3 from bs4 import BeautifulSoup
 4 import re
 5 
 6 url = 'http://music.163.com/discover/artist/cat?id=1001'#華語男歌手頁面
 7 r = requests.get(url)
 8 r.raise_for_status()
 9 r.encoding = r.apparent_encoding
10 html=r.text  #獲取整個網頁
11 
12 soup = BeautifulSoup(html,'html.parser') #
13 top_10 = soup.find_all('div',attrs = {'class':'u-cover u-cover-5'})
14 #print(top_10)
15 
16 singers = []
17 for i in top_10:
18     singers.append(re.findall(r'.*?<a class="msk" href="(/artist\?id=\d+)" title="(.*?)的音樂"></a>.*?',str(i))[0])
19 #print(singers)
20 
21 url = 'http://music.163.com'
22 for singer in singers:
23         try:
24             new_url = url + str(singer[0])
25             #print(new_url)
26             songs=requests.get(new_url).text
27             soup = BeautifulSoup(songs,'html.parser')
28             Info = soup.find_all('textarea',attrs = {'style':'display:none;'})[0]
29             songs_url_and_name = soup.find_all('ul',attrs = {'class':'f-hide'})[0]
30             #print(songs_url_and_name)            
31             datas = []
32             data1 = re.findall(r'"album".*?"name":"(.*?)".*?',str(Info.text))
33             data2 = re.findall(r'.*?<li><a href="(/song\?id=\d+)">(.*?)</a></li>.*?',str(songs_url_and_name))
34 
35             for i in range(len(data2)):
36                 datas.append([data2[i][1],data1[i],'http://music.163.com/#'+ str(data2[i][0])])
37             #print(datas)
38             book = xlwt.Workbook()
39             sheet1=book.add_sheet('sheet1',cell_overwrite_ok = True)
40             sheet1.col(0).width = (25*256)
41             sheet1.col(1).width = (30*256)
42             sheet1.col(2).width = (40*256)
43             heads=['歌曲名稱','專輯','歌曲鏈接']
44             count=0
45 
46             for head in heads:
47                 sheet1.write(0,count,head)
48                 count+=1
49 
50             i=1
51             for data in datas:
52                 j=0
53                 for k in data:
54                     sheet1.write(i,j,k)
55                     j+=1
56                 i+=1
57             book.save(str(singer[1])+'.xls')#括號里寫存入的地址
58             
59         except:
60             continue

View Code

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 python爬蟲:了解JS加密爬取網易雲音樂 python網絡爬蟲&&爬取網易雲音樂網易雲音樂爬取 python爬取網易雲音樂歌單音樂 python3爬蟲應用--爬取網易雲音樂（兩種辦法） python學習之爬蟲(一) ——————爬取網易雲歌詞【Python3爬蟲】網易雲音樂爬蟲爬蟲爬取網易雲歌單獲取網易雲音樂歌曲URL 下載網易雲音樂付費歌曲