1.創建虛擬環境 ,虛擬環境名為sd
mkvirtualenv sd #方便管理
2. 安裝 scrapyd
pip3 install scrapyd
3. 配置
mkdir /etc/scrapyd
vim /etc/scrapyd/scrapyd.conf
寫入一下配置
參考官網:https://scrapyd.readthedocs.io/en/stable/config.html#config
[scrapyd] eggs_dir = eggs logs_dir = logs items_dir = jobs_to_keep = 5 dbs_dir = dbs max_proc = 0 max_proc_per_cpu = 4 finished_to_keep = 100 poll_interval = 5.0 #bind_address = 127.0.0.1 bind_address = 0.0.0.0 http_port = 6800 debug = off runner = scrapyd.runner application = scrapyd.app.application launcher = scrapyd.launcher.Launcher webroot = scrapyd.website.Root [services] schedule.json = scrapyd.webservice.Schedule cancel.json = scrapyd.webservice.Cancel addversion.json = scrapyd.webservice.AddVersion listprojects.json = scrapyd.webservice.ListProjects listversions.json = scrapyd.webservice.ListVersions listspiders.json = scrapyd.webservice.ListSpiders delproject.json = scrapyd.webservice.DeleteProject delversion.json = scrapyd.webservice.DeleteVersion listjobs.json = scrapyd.webservice.ListJobs daemonstatus.json = scrapyd.webservice.DaemonStatus
bind_address:默認是本地127.0.0.1,修改為0.0.0.0,可以讓外網訪問。

一. 部署&運行 deploy: 部署scrapy爬蟲程序 # scrapyd-deploy 部署服務器名 -p 項目名稱 scrapyd-deploy ubuntu -p douyu run : 運行 #curl http://localhost:6800/schedule.json -d project=project_name -d spider=spider_name curl http://127.0.0.1:6800/schedule.json -d project=douyu -d spider=dy stop: 停止 #curl http://localhost:6800/cancel.json -d project=project_name -d job=jobid curl http://127.0.0.1:6800/cancel.json -d project=douyu -d job=$1 二. 允許外部訪問配置 定位配置文件: default_scrapyd.conf find /home/wg -name default_scrapyd.conf cd /home/wg/scrapy_env/lib/python3.6/site-packages/scrapyd 允許外部訪問: vim default_scrapyd.conf bind_address = 0.0.0.0 三. 遠程監控-url指令: 1、獲取狀態 http://127.0.0.1:6800/daemonstatus.json 2、獲取項目列表 http://127.0.0.1:6800/listprojects.json 3、獲取項目下已發布的爬蟲列表 http://127.0.0.1:6800/listspiders.json?project=myproject 4、獲取項目下已發布的爬蟲版本列表 http://127.0.0.1:6800/listversions.json?project=myproject 5、獲取爬蟲運行狀態 http://127.0.0.1:6800/listjobs.json?project=myproject 6、啟動服務器上某一爬蟲(必須是已發布到服務器的爬蟲) http://127.0.0.1:6800/schedule.json (post方式,data={"project":myproject,"spider":myspider}) 7、刪除某一版本爬蟲 http://127.0.0.1:6800/delversion.json (post方式,data={"project":myproject,"version":myversion}) 8、刪除某一工程,包括該工程下的各版本爬蟲 http://127.0.0.1:6800/delproject.json(post方式,data={"project":myproject}) 四. 常用腳本 循環任務: while true do curl http://127.0.0.1:6800/schedule.json -d project=FXH -d spider=five_sec_info sleep 10 done 實時時間打印: echo "$(date +%Y-%m-%d:%H:%M.%S), xx-spider定時啟動--"
啟動:
scrapyd
查看本機ip:
瀏覽器中訪問:
192.168.12.80:6800
scrapyd-client 及部署
1. 安裝scrapyd-client
pip3 install scrapyd-client
2.
安裝成功后會有一個可用命令,叫作scrapyd-deploy,即部署命令。
我們可以輸入如下測試命令測試Scrapyd-Client是否安裝成功:
3. crapyd-deploy 不是內部命令,所以需要進行項目配置
windows下的scrapyd-deploy無法運行的解決辦法
.進到c:/python/Scripts 目錄下,創建兩個新文件:
scrapy.bat
scrapyd-deploy.bat
編輯兩個文件:
scrapy.bat文件中輸入以下內容 :
@echo off "C:\Python36" "C:\Python36\Scripts\scrapy" %*
scrapyd-deploy.bat 文件中輸入以下內容:
@echo off "C:\Python36\python" "C:\Python36\Scripts\scrapyd-deploy" %*
4.再次查看
可以了。