scrapy 為什么要用yield item 而不用yield dict來傳輸數據


經過實踐, yield dict和yield item一樣有效果,不過為什么官方要用yield item ,以下是官方解釋:

 

The main goal in scraping is to extract structured data from unstructured sources, typically, web pages. Scrapy spiders can return the extracted data as Python dicts. While convenient and familiar, Python dicts lack structure: it is easy to make a typo in a field name or return inconsistent data, especially in a larger project with many spiders.

To define common output data format Scrapy provides the Item class. Item objects are simple containers used to collect the scraped data. They provide a dictionary-like API with a convenient syntax for declaring their available fields.

Various Scrapy components use extra information provided by Items: exporters look at declared fields to figure out columns to export, serialization can be customized using Item fields metadata, trackref tracks Item instances to help find memory leaks (see Debugging memory leaks with trackref), etc.

 

簡單的說,就是爬蟲過多的時候,使用dict容易出現鍵字打錯,而造成數據傳輸錯誤,使用item 系統可以通過key error來提示程序員從而避免這種問題。


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM