scrapy定時執行抓取任務

本文轉載自查看原文 2015-05-13 14:45 14449 爬蟲

在ubuntu環境下，使用scrapy定時執行抓取任務，由於scrapy本身沒有提供定時執行的功能，所以采用了crontab的方式進行定時執行：

首先編寫要執行的命令腳本cron.sh

#! /bin/sh                                                                                                                                            

export PATH=$PATH:/usr/local/bin

cd /home/zhangchao/CVS/testCron

nohup scrapy crawl example >> example.log 2>&1 &

執行，crontab -e，規定crontab要執行的命令和要執行的時間頻率，這里我需要每一分鍾就執行scrapy crawl example這條爬取命令：

# Edit this file to introduce tasks to be run by cron.
#
# Each task to run has to be defined through a single line
# indicating with different fields when the task will be run
# and what command to run for the task
#
# To define the time you can provide concrete values for
# minute (m), hour (h), day of month (dom), month (mon),
# and day of week (dow) or use '*' in these fields (for 'any').#
# Notice that tasks will be started based on the cron's system
# daemon's notion of time and timezones.
#
# Output of the crontab jobs (including errors) is sent through
# email to the user the crontab file belongs to (unless redirected).
#
# For example, you can run a backup of all your user accounts
# at 5 a.m every week with:
# 0 5 * * 1 tar -zcf /var/backups/home.tgz /home/
#
# For more information see the manual pages of crontab(5) and cron(8)
#
# m h  dom mon dow   command

*/1 * * * *  sh /home/zhangchao/CVS/testCron/cron.sh

編輯好了后，發現ubuntu的/var/log/下面沒有crontab的日志，原因是因為ubuntu默認沒有開啟crontab的日志功能，執行如下操作：

emacs /etc/rsyslog.d/50-default.conf ，將cron.*這一行前的注釋打開：

然后重啟sudo service rsyslog restart

最后就可以使用tail –f /var/log/cron.log查看crontab的日志了，可以看到cron.sh每一分鍾被執行了一次：

借此機會復習下，crontab的常見格式：

每分鍾執行 */1 * * * *

每小時執行 0 * * * *

每天執行 0 0 * * *

每周執行 0 0 * * 0

每月執行 0 0 1 * *

每年執行 0 0 1 1 *

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Scrapy定時執行爬取任務與定時關閉任務 linux下執行scrapy的爬蟲定時任務 Spark任務定時執行 Android定時執行和停止某任務在IIS上定時執行任務 Crontab定時執行任務設置定時執行任務 linux定時執行任務 python實現scrapy定時執行爬蟲隨機定時執行任務