Scrapy怎樣同時運行多個爬蟲？

本文轉載自查看原文 2019-07-12 17:02 1033 scrpay 框架

　　默認情況下，當你運行 scrapy crawl 命令的時候，scrapy只能在單個進程里面運行一個爬蟲。然后Scrapy運行方式除了采用命令行式的運行方式以外還可以使用API的方式來運行爬蟲，而采用API的方式運行的爬蟲是支持運行多個爬蟲的。

　　下面的案例是運行多個爬蟲：

import scrapy
from scrapy.crawler import CrawlerProcess

class MySpider1(scrapy.Spider):
    # Your first spider definition
    ...

class MySpider2(scrapy.Spider):
    # Your second spider definition
    ...

process = CrawlerProcess() # 初始化事件循環
process.crawl(MySpider1) # 將爬蟲類方式事件循環
process.crawl(MySpider2) # 將爬蟲類方式事件循環
process.start() # the script will block here until all crawling jobs are finished

　　此外采用 CrawlerRunner 也是可行的：

import scrapy
from twisted.internet import reactor
from scrapy.crawler import CrawlerRunner
from scrapy.utils.log import configure_logging

class MySpider1(scrapy.Spider):
    # Your first spider definition
    ...

class MySpider2(scrapy.Spider):
    # Your second spider definition
    ...

configure_logging()
runner = CrawlerRunner()
runner.crawl(MySpider1)
runner.crawl(MySpider2)
d = runner.join()
d.addBoth(lambda _: reactor.stop())

reactor.run() # the script will block here until all crawling jobs are finished

　　deferreds的方式來運行：

from twisted.internet import reactor, defer
from scrapy.crawler import CrawlerRunner
from scrapy.utils.log import configure_logging

class MySpider1(scrapy.Spider):
    # Your first spider definition
    ...

class MySpider2(scrapy.Spider):
    # Your second spider definition
    ...

configure_logging()
runner = CrawlerRunner()

@defer.inlineCallbacks
def crawl():
    yield runner.crawl(MySpider1)
    yield runner.crawl(MySpider2)
    reactor.stop()

crawl()
reactor.run() # the script will block here until the last crawl call is finished

　　更多細節參考：

Scrapy文檔

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 同時運行多個scrapy爬蟲的幾種方法（自定義scrapy項目命令）同時運行多個scrapy爬蟲的幾種方法（自定義scrapy項目命令） Scrapy 運行多個爬蟲 Learning Scrapy筆記（七）- Scrapy根據Excel文件運行多個爬蟲 linux同時運行多個命令 CPU如何同時運行多個進程？同時運行多個tomcat的配置 Scrapy同時啟動多個爬蟲 Linux Apache配置多個站點同時運行 Linux配置多個Tomcat同時運行