python 關於函數遞歸調用自己


爬取b站博人傳

每頁短評20個,頁數超過1000頁,

代碼如下

import requests
import json
import csv
def main(start_url):
    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.75 Safari/537.36',}
    res = requests.get(url=start_url,headers=headers).content.decode()
    data = json.loads(res)
    try:
        data = data['result']['list']
    except:
        print('-----------')
    cursor = re.findall('"cursor":"(\d+)",',res)


    for i in data:
        mid = i['author']['mid']
        uname = i['author']['uname']
        content = i['content']
        content= content.strip()
        try:
            last_index_show = i['user_season']['last_index_show']
        except:
            last_index_show = None

        print(mid,uname,content,last_index_show)
        print('------------------------')

        with open('borenzhuan_duanping.csv', 'a', newline='',encoding='utf-8')as f:
            writer = csv.writer(f)
            writer.writerow([mid,uname,content,last_index_show])


    if cursor:
        next_url = 'https://bangumi.bilibili.com/review/web_api/short/list?media_id={}&folded=0&page_size=20&sort=0&sort=0&cursor='.format(id) + cursor[0]
        main(next_url)
    else:
        print('抓取完成')

if __name__ == '__main__':



    zhuye_url = 'https://www.bilibili.com/bangumi/media/md5978/'
    id = re.findall('md(\d+)', zhuye_url)[0]
    start_url = 'https://bangumi.bilibili.com/review/web_api/short/list?media_id={}&folded=0&page_size=20&sort=0&cursor='.format(id)

    main(start_url)

 

在爬取過程中發現,每當遞歸到999會發生異常

RecursionError: maximum recursion depth exceeded in comparison

這個函數在遞歸自身是發生的異常

只需要在程序開頭添加

import sys
sys.setrecursionlimit(100000)

防止內存爆炸 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM