B站彈幕簡單爬蟲
功能:獲取視頻彈幕並保存到txt文檔
使用方法:找到b站視頻所在的aid,傳入到main函數下的av='一串數字,即aid'即可
找aid方法:視頻下有個轉發按鈕,鼠標放上去可以看到有個嵌入代碼iframe,那里就有aid。或者檢查、network、刷新一下,在Name欄可以找到包含aid的網址
"""****
首先獲取avid
從pagelist里獲取,先將av空格去掉,然后獲取url,獲取res里的text,用json轉換成字典,返回
然后根據cid獲取彈幕
首先獲取url,然后解碼成utf-8,然后用正則表達式,然后用findall尋找所有,然后返回
存儲
"""
import requests
import json
import re
def getcid(av):
av = av.strip('av')
url = f'https://api.bilibili.com/x/player/pagelist?aid={av}&jsonp=jsonp'
res = requests.get(url)
res = res.text
print(res)
res_dict = json.loads(res)
cid = res_dict['data'][0]['cid']
return cid
def getdanmu(cid):
url = f'https://api.bilibili.com/x/v1/dm/list.so?oid={cid}'
res = requests.get(url)
de = res.content.decode('utf-8')
ch = re.compile('<d.*?>(.*?)</d>')
danmu = ch.findall(de)
return danmu
def savedanmu(danmu, filename):
with open(filename, mode='w', encoding='utf-8') as f:
for w in danmu:
f.write(w)
f.write(' ')
# <iframe src="//player.bilibili.com/player.html?aid=545259775&bvid=BV1Sq4y1E7HQ&cid=331019617&page=1"
# scrolling="no" border="0" frameborder="no" framespacing="0" allowfullscreen="true"> </iframe>
if __name__ == '__main__':
av = '890452353'
cid = getcid(av)
danmu = getdanmu(cid)
savedanmu(danmu, f'{av}.txt')