關於 Scrapy 中自定義 Spider 傳遞參數問題

本文轉載自查看原文 2019-06-27 13:41 928 Python

實際應用中，我們有可能在啟動 Scrapy 的時候自定義一些參數來控制不同的業務流程，Google 嘗試了如下方式可以實現。

修改 Spider 構造函數

class myspider(Spider):

    # 爬蟲名稱
    name = "myspider"

    # 構造函數
    def __init__(self, tp=None, *args, **kwargs):
        super(myspider, self).__init__(*args, **kwargs)
        # scrapyd 控制 spider 的時候，可以向 schedule.json 發送 -d 選項加入參數
        self.tp = tp

    # 開始地址 (與 start_requests 不能同時設置)
    # start_urls = ['https://www.google.com']

    # 定義請求的URL
    def start_requests(self):
        if self.tp == 'tp_news_spider':
            yield self.make_requests_from_url(news_url)
        else:
            urls = []

命令行啟動

scrapy crawl myspider -a tp=tp_news_spider

使用 Scrapyd 管理 Spider

可以向 schedule.json 發送 -d 選項加入參數

curl http://localhost:6800/schedule.json -d project=myproject -d spider=myspider -d setting=DOWNLOAD_DELAY=2 -d tp=tp_news_spider

Cron 控制

public async Task SchedulePollingTaskBackgroundJobAsync()
        {
            try
            {
                var response = await @"http://172.0.0.1:8080/schedule.json"
                                      .WithBasicAuth("user", "pwd")
                                      .PostUrlEncodedAsync(new { project = "myproject", spider = "myspider", tp = "tp_news_spider" })
                                      .ReceiveString();
            }
            catch (Exception ex)
            {

            }
        }

   //http://www.bejson.com/othertools/cronvalidate/
   RecurringJob.AddOrUpdate(() => SchedulePollingTaskBackgroundJobAsync(), @"0/15 * * * * ?", TZConvert.GetTimeZoneInfo("Asia/Shanghai"));

REFER:
https://blog.csdn.net/Q_AN1314/article/details/50748700

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Jenkins 傳遞自定義的參數 element-ui組件中的input等的change事件中傳遞自定義參數 element-ui組件中的select等的change事件中傳遞自定義參數 element-ui組件中的select及cascader等內的change事件中傳遞自定義參數 springboot自定義注解判斷參數是否傳遞或者為空 addEventListener事件監聽傳遞自定義參數 scrapy 在spider中處理超時 AngularJS在自定義指令中傳遞Model scrapy--meta參數傳遞問題 sparksql udf自定義函數中參數過多問題的解決