scrapy - Request 中的回調函數不執行

本文轉載自查看原文 2018-01-13 12:24 2511

在 scrapy 中，

scrapy.Request(url, headers=self.header, callback=self.parse)

調試的時候，發現回調函數 parse_detail 沒有被調用，這可能就是被過濾掉了，查看 scrapy 的輸出日志 offsite/filtered 會顯示過濾的數目。這個問題如何解決呢，查看手冊發現(https://doc.scrapy.org/en/latest/faq.html?highlight=offsite%2Ffiltered)這個問題，這些日志信息都是由 scrapy 中的一個 middleware 拋出的，如果沒有自定義，那么這個 middleware 就是默認的 Offsite Spider Middleware，它的目的就是過濾掉那些不在 allowed_domains 列表中的請求 requests。

再次查看手冊中關於 OffsiteMiddleware 的部分(https://doc.scrapy.org/en/latest/topics/spider-middleware.html#scrapy.spidermiddlewares.offsite.OffsiteMiddleware)
兩種方法能夠使 requests 不被過濾:
1. 在 allowed_domains 中加入 url
2. 在 scrapy.Request() 函數中將參數 dont_filter=True 設置為 True

如下摘自手冊

If the spider doesn’t define an allowed_domains attribute, or the attribute is empty, the offsite middleware will allow all requests.

If the request has the dont_filter attribute set, the offsite middleware will allow the request even if its domain is not listed in allowed domains

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Scrapy-Request中的回調函數不執行 Scrapy - Request 中的回調函數callback不執行 Scrapy框架: Request回調函數 scrapy yield 回調函數不執行解決方案 scrapy框架中向回調函數傳值的兩個方法 scrapy中的request $.getJSON()不執行回調函數關於scrapy中scrapy.Request中的屬性 ajax中的post方法中回調函數不執行的問題 Scrapy回調函數callback傳遞參數的方式