對微博進行爬蟲的時候,一定要注意一下訪問頻率


基本測試腳本(python):

import time,requests

def test_ip_freq(freq):
    if freq==0:
        return
    #測試1分鍾
    delay=1/freq
    t0=time.time()
    requests_num=0
    status="success"
    while 1:
        r = requests.get("https://m.weibo.cn/api/container/getIndex?containerid=100103type%3D61%26q%3D%E7%96%AB%E6%83%85%26t%3D0&page_type=searchall&page=2")
        if r.status_code != 200:
            status='fail'
            break
        requests_num+=1
        if time.time()-t0>5*60:
            break
        time.sleep(delay)
    print("當前的訪問頻率是{0}/s,狀態:{1},請求總數{2},耗時{3}s, 實際頻率{4}".format(freq,status,requests_num,time.time()-t0,requests_num/(time.time()-t0)))
    return status
for i in [0.3,0.35,0.4,0.45,0.5]:
    status=test_ip_freq(i)
    if status=='fail':
        break
#統計ip被封的時間
t0=time.time()
while 1:
    r = requests.get("https://m.weibo.cn/api/container/getIndex?containerid=100103type%3D61%26q%3D%E7%96%AB%E6%83%85%26t%3D0&page_type=searchall&page=2")
    if r.status_code == 200:
        break
    time.sleep(10)
print("ip被封的時間是{0}s".format(time.time()-t0))

測試結果:

當前的訪問頻率是0.3/s,狀態:success,請求總數81,耗時303.2352440357208s, 實際頻率0.2671193457659502
當前的訪問頻率是0.35/s,狀態:success,請求總數91,耗時302.8865134716034s, 實際頻率0.30044256166107425
當前的訪問頻率是0.4/s,狀態:fail,請求總數53,耗時164.40774130821228s, 實際頻率0.3223692484202544
ip被封的時間是183s

 

https代理推薦: 

芝麻代理: http://h.zhimaruanjian.com/ 

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM