scrapy download delay, CONCURRENT_REQUESTS

本文轉載自查看原文 2021-09-12 05:11 136 A5-scrapy

設置delay有起碼兩個好處，一個是對被爬對象表示禮貌，另一個是爬的太快，很多服務器會封ip，或限制訪問。

效果：每x秒左右來一個request

先建立一個項目來找CONCURRENT_REQUESTS與DOWNLOAD_DELAY的聯系

大致給出粗略代碼：

jianshuspider.py：

import scrapy
from JianshuSpider_author_1.items

import JianshuspiderAuthor1Item

from scrapy.selector
import Selector

class JianshuSpider(scrapy.Spider):

name ="jianshu"

def start_requests(self):

urls = ['http://www.jianshu.com/users/958f740aed52/followers']

for url
in urls:

yield scrapy.Request(
url = url,
callback=
self.parse_author)

def parse_author(

self,

response):
item = JianshuspiderAuthor1Item()

selector = Selector(response)

fans_href = selector.xpath("//div[@class='info']/a/@href").extract()

for fan_href
in fans_href:

fan_href ='http://www.jianshu.com/users/'+ fan_href.split('/')[-1] +'/followers'

# fan_href = 'http://www.google.com.hk/'+ fan_href.split('/')[-1] + '/followers'#需要timeout時調用

yield scrapy.Request(fan_href,
callback=self.parse_author)

item['author'] = selector.xpath("//div[@class='title']/a/text()").extract_first()

yield item

requestlimit.py(downlomiddleware):

class RequestLimitMiddleware(object):

count =0

def process_request(self,request,spider):

self.count +=1

print(self.count)

以上兩個文件的代碼為核心代碼。

測試結果：

一：

settings.py

CONCURRENT_REQUESTS =8

DOWNLOAD_DELAY =0

並且jianshuspider.py中關閉遞歸簡書鏈接，打開Google鏈接語句

效果：8個request同時來，同時timeout。8個request又來，又timeout。如此循環。

二：

settings.py

CONCURRENT_REQUESTS =1

DOWNLOAD_DELAY =5

並且jianshuspider.py中打開遞歸簡書鏈接，關閉Google鏈接語句

效果：每5秒左右來一個request

三：

settings.py

CONCURRENT_REQUESTS =2
DOWNLOAD_DELAY =5

並且jianshuspider.py中打開遞歸簡書鏈接，關閉Google鏈接語句

效果：一開始來2個request（A，B），但5秒后只處理了一個request(A)，新來一個request(C),5秒后又處理一個request（B）,排隊一個request（D）。如此循環。

總結：

DOWNLOAD_DELAY 會影響 CONCURRENT_REQUESTS，不能使並發顯現出來。

思考：

當有CONCURRENT_REQUESTS，沒有DOWNLOAD_DELAY 時，服務器會在同一時間收到大量的請求。

當有CONCURRENT_REQUESTS，有DOWNLOAD_DELAY 時，服務器不會在同一時間收到大量的請求。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Scrapy框架--Requests對象爬蟲四大金剛：requests，selenium，BeautifulSoup，Scrapy await Task.Delay(delay) [UE4]Delay與Retriggerable Delay requests scrapy delay的幾個函數說明 SQL Server的WAITFOR DELAY注入優雅地實現CSS Animation delay C# Task Delay 使用