scrapy相關:splash安裝 A javascript rendering service 渲染


0.

splash: 美人魚  濺,潑 

1.參考

Splash使用初體驗 

docker在windows下的安裝

 

https://blog.scrapinghub.com/2015/03/02/handling-javascript-in-scrapy-with-splash/

Splash is our in-house solution for JavaScript rendering, implemented in Python using Twisted and QT.  官方博客介紹,splash 是 scrapinghub 的內部解決方案???

https://scrapinghub.com/ 

We're the creators and the main maintainers of Scrapy. 創始人和維護者...背后的大佬

 

github: scrapinghub/splash

Splash is a javascript rendering service with an HTTP API. It's a lightweight browser with an HTTP API, implemented in Python 3 using Twisted and QT5.

It's fast, lightweight and state-less which makes it easy to distribute. 用於渲染js頁面

http://splash.readthedocs.io/en/latest/index.html

splash 官方文檔

github: scrapy-plugins/scrapy-splash

This library provides Scrapy and JavaScript integration using Splash. 如何在 scrapy 中使用 splash

 

http://splash.readthedocs.io/en/stable/api.html#request-filters  

Splash supports filtering requests based on Adblock Plus rules.  還沒有搞定

 

2.安裝使用

https://stackoverflow.com/questions/30345623/scraping-dynamic-content-using-python-scrapy

提到 ScrapyJS,但是鏈接地址跳轉 https://github.com/scrapy-plugins/scrapy-splash#installation

https://pypi.python.org/pypi/scrapyjs

https://pypi.python.org/pypi/scrapy-splash

2.1 安裝 scrapy-splash

C:\Users\win7>pip install scrapy-splash
Collecting scrapy-splash
  Downloading scrapy_splash-0.7.2-py2.py3-none-any.whl
Installing collected packages: scrapy-splash
Successfully installed scrapy-splash-0.7.2

2.2 通過 docker 安裝 image:scrapinghub/splash

官網找到下載鏈接

https://store.docker.com/editions/community/docker-ce-desktop-windows

Get Docker Community Edition for Windows

Docker for Windows is available for free.

Requires Microsoft Windows 10 Professional or Enterprise 64-bit. For previous versions get Docker Toolbox.

 

右鍵管理員安裝,最好勾選非必要項???

 

右鍵管理員啟動 Docker Quickstart Terminal ,提示沒找到 bash.exe

輸出:

Creating CA: C:\Users\win7\.docker\machine\certs\ca.pem
Creating client certificate: C:\Users\win7\.docker\machine\certs\cert.pem
Running pre-create checks...
(default) Image cache directory does not exist, creating it at C:\Users\win7\.docker\machine\cache...
(default) No default Boot2Docker ISO found locally, downloading the latest release...
(default) Latest release for github.com/boot2docker/boot2docker is v17.09.0-ce
(default) Downloading C:\Users\win7\.docker\machine\cache\boot2docker.iso from https://github.com/boot2docker/boot2docker/releases/download/v17.09.0-ce/boot2docker.iso...
(default) 0%....10%....20%....30%....40%....50%....60%....70%....80%....90%....100%
Creating machine...
(default) Copying C:\Users\win7\.docker\machine\cache\boot2docker.iso to C:\Users\win7\.docker\machine\machines\default\boot2docker.iso...
(default) Creating VirtualBox VM...
(default) Creating SSH key...
(default) Starting the VM...
(default) Check network to re-create if needed...
(default) Windows might ask for the permission to create a network adapter. Sometimes, such confirmation window is minimized in the taskbar.
(default) Found a new host-only adapter: "VirtualBox Host-Only Ethernet Adapter #2"
(default) Windows might ask for the permission to configure a network adapter. Sometimes, such confirmation window is minimized in the taskbar.
(default) Windows might ask for the permission to configure a dhcp server. Sometimes, such confirmation window is minimized in the taskbar.
(default) Waiting for an IP...
Waiting for machine to be running, this may take a few minutes...
Detecting operating system of created instance...
Waiting for SSH to be available...
Detecting the provisioner...
Provisioning with boot2docker...
Copying certs to the local machine directory...
Copying certs to the remote machine...
Setting Docker configuration on the remote daemon...
Checking connection to Docker...
Docker is up and running!
To see how to connect your Docker Client to the Docker Engine running on this virtual machine, run: D:\Program Files\Docker Toolbox\docker-machine.exe env default



                        ##         .
                  ## ## ##        ==
               ## ## ## ## ##    ===
           /"""""""""""""""""\___/ ===
      ~~~ {~~ ~~~~ ~~~ ~~~~ ~~~ ~ /  ===- ~~~
           \______ o           __/
             \    \         __/
              \____\_______/

docker is configured to use the default machine with IP 192.168.99.100
For help getting started, check out the docs at https://docs.docker.com

Start interactive shell

win7@win7-PC MINGW64 ~
$ docker info
Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 0
Server Version: 17.09.0-ce
Storage Driver: aufs
 Root Dir: /mnt/sda1/var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 0
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 06b9cb35161009dcb7123345749fef02f7cea8e0
runc version: 3f2f8b84a77f73d38244dd690525642a72156c64
init version: 949e6fa
Security Options:
 seccomp
  Profile: default
Kernel Version: 4.4.89-boot2docker
Operating System: Boot2Docker 17.09.0-ce (TCL 7.2); HEAD : 06d5c35 - Wed Sep 27 23:22:43 UTC 2017
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 995.8MiB
Name: default
ID: O33J:6GDF:AQ6P:RBM7:6KLF:OZHY:2N3J:QZKV:YIJT:G3AI:XCPD:NZ3G
Docker Root Dir: /mnt/sda1/var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: 17
 Goroutines: 26
 System Time: 2017-10-18T09:58:42.414047781Z
 EventsListeners: 0
Registry: https://index.docker.io/v1/
Labels:
 provider=virtualbox
Experimental: false
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false


win7@win7-PC MINGW64 ~
$ ipconfig

Windows IP 配置


以太網適配器 lan:

   連接特定的 DNS 后綴 . . . . . . . :
   本地鏈接 IPv6 地址. . . . . . . . : fe80::f950:bf55:726b:b7a6%14
   IPv4 地址 . . . . . . . . . . . . : 192.168.144.100
   子網掩碼  . . . . . . . . . . . . : 255.255.255.0
   默認網關. . . . . . . . . . . . . : 192.168.144.254

以太網適配器 VirtualBox Host-Only Network #2:

   連接特定的 DNS 后綴 . . . . . . . :
   本地鏈接 IPv6 地址. . . . . . . . : fe80::1c18:13ad:7ed2:c0ff%29
   IPv4 地址 . . . . . . . . . . . . : 192.168.99.1
   子網掩碼  . . . . . . . . . . . . : 255.255.255.0
   默認網關. . . . . . . . . . . . . :

隧道適配器 isatap.{CE007B04-2C7A-4A52-8BBF-1BCB4682EEB9}:

   媒體狀態  . . . . . . . . . . . . : 媒體已斷開
   連接特定的 DNS 后綴 . . . . . . . :

隧道適配器 Teredo Tunneling Pseudo-Interface:

   媒體狀態  . . . . . . . . . . . . : 媒體已斷開
   連接特定的 DNS 后綴 . . . . . . . :

隧道適配器 isatap.{93C68FD9-301C-484C-AFCB-5549CA24453B}:

   媒體狀態  . . . . . . . . . . . . : 媒體已斷開
   連接特定的 DNS 后綴 . . . . . . . :

win7@win7-PC MINGW64 ~
$
View Code

 

里面重要信息:

(default) Copying C:\Users\win7\.docker\machine\cache\boot2docker.iso to C:\Users\win7\.docker\machine\machines\default\boot2docker.iso...
(default) Creating VirtualBox VM...

docker is configured to use the default machine with IP 192.168.99.100
For help getting started, check out the docs at https://docs.docker.com

 putty 連接:

192.168.99.100
22

docker
tcuser

 第一次需要從docker hub下載相關鏡像文件

sudo docker pull scrapinghub/splash

 后面每次啟動splash服務,並通過http,https,telnet提供服務

#通常一般使用http模式 ,可以只啟動一個8050就好  
#Splash 將運行在 0.0.0.0 at ports 8050 (http), 8051 (https) and 5023 (telnet).
sudo docker run -p 5023:5023 -p 8050:8050 -p 8051:8051 scrapinghub/splash

 瀏覽器打開

http://192.168.99.100:8050


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM