經過實踐, yield dict和yield item一樣有效果,不過為什么官方要用yield item ,以下是官方解釋:
The main goal in scraping is to extract structured data from unstructured sources, typically, web pages. Scrapy spiders can return the extracted data as Python dicts. While convenient and familiar, Python dicts lack structure: it is easy to make a typo in a field name or return inconsistent data, especially in a larger project with many spiders.
To define common output data format Scrapy provides the Item
class. Item
objects are simple containers used to collect the scraped data. They provide a dictionary-like API with a convenient syntax for declaring their available fields.
Various Scrapy components use extra information provided by Items: exporters look at declared fields to figure out columns to export, serialization can be customized using Item fields metadata, trackref
tracks Item instances to help find memory leaks (see Debugging memory leaks with trackref), etc.
簡單的說,就是爬蟲過多的時候,使用dict容易出現鍵字打錯,而造成數據傳輸錯誤,使用item 系統可以通過key error來提示程序員從而避免這種問題。