Scrapy是為了爬取網站數據而編寫的一款應用框架,出名,強大。所謂的框架其實就是一個集成了相應的功能且具有很強通用性的項目模板。
其實在Linux和 Mac安裝,就簡單的pip命令即可:
pip install wheel
但是在Windows上安裝卻有很多坑,所以下面小編講一下自己在windows10安裝及配置Scrapy中遇到的一些坑及其解決的方法,現在總結如下,希望對大家有所幫助。
包的下載地址:https://www.lfd.uci.edu/~gohlke/pythonlibs/
常見問題一:pip版本需要升級
如果你的pip版本比較老,可能在安裝的過程中需要更新對應的pip版本,所以最好通過指令升級一下pip
升級指令如下(這是在cmd中操作):
python -m pip install --upgrade pip
升級完成后,這一類問題就解決了。
常見問題二:安裝wheel
pip install wheel
如果未安裝wheel,使用該命令可以直接安裝wheel,如果已經安裝了,使用該命令則會顯示如下圖所出信息,不會重復進行安裝
Requirement already satisfied: wheel in d:\python3\lib\site-packages
常見問題三:缺少lxml
順利安裝完成wheel,到這里對應的.whl文件,注意別改文件名,然后下載如下的xlml文件,我們可以在LFD中下載對應版本的lxml,如下(我的是windows 64位操作系統,python版本是3.6)
下載之后,進入cmd命令行安裝好對應的whl文件:
pip install lxml-4.1.1-cp36-cp36m-win_amd64.whl
未安裝的,可以直接安裝,已經安裝的會出現如下代碼表示成功
Requirement already satisfied: lxml==4.1.1 from file:///D:/lxml-4.1.1-cp36-cp36m-win_amd64.whl in d:\python3\lib\site-packages
常見問題四:路徑沖突
Error in sitecustomize; set PYTHONVERBOSE for traceback: AttributeError: module 'sys' has no attribute 'setdefaultencoding'
因為sys.path 中多了python27的site-package沖突
到“…/local/lib/python3.6/site-packages/“目錄下(目錄因人而已),刪除里面的路徑即可
python -v homebrew.pth
常見問題五:缺少Twisted
安裝Twisted,然后根據自己的電腦安裝(我的是python 3.6,操作系統是64位,名稱中間的cp36是python3.6的意思,amd64是python的位數)
下載好后,安裝命令如下:
pip install Twisted-17.9.0-cp36-cp36m-win_amd64.whl
未安裝的,可以直接安裝,安裝的則顯示成功,如下:
Successfully installed Twisted-17.9.0
常見問題六:出現UnicodeDecodeError
(由於小編已經踩過坑了,所以這些代碼都是網上找到的相似代碼,大體內容相似,問題一致)
Exception: Traceback (most recent call last): File "c:\program files\python36\lib\site-packages\pip\compat\__init__.py", line 73, in console_to_str return s.decode(sys.__stdout__.encoding) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb6 in position 34: invalid start byte During handling of the above exception, another exception occurred: Traceback (most recent call last): File "c:\program files\python36\lib\site-packages\pip\basecommand.py", line 215, in main status = self.run(options, args) File "c:\program files\python36\lib\site-packages\pip\commands\install.py", line 342, in run prefix=options.prefix_path, File "c:\program files\python36\lib\site-packages\pip\req\req_set.py", line 784, in install **kwargs File "c:\program files\python36\lib\site-packages\pip\req\req_install.py", line 878, in install spinner=spinner, File "c:\program files\python36\lib\site-packages\pip\utils\__init__.py", line 676, in call_subprocess line = console_to_str(proc.stdout.readline()) File "c:\program files\python36\lib\site-packages\pip\compat\__init__.py", line 75, in console_to_str return s.decode('utf_8') UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb6 in position 34: invalid start byte
或者下面error:
Exception: Traceback (most recent call last): File "c:\users\59740\appdata\local\programs\python\python36\lib\site-packages\pip\compat\__init__.py", line 73, in console_to_str return s.decode(sys.__stdout__.encoding) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb6 in position 34: invalid start byte During handling of the above exception, another exception occurred: Traceback (most recent call last): File "c:\users\59740\appdata\local\programs\python\python36\lib\site-packages\pip\commands\install.py", line 342, in run prefix=options.prefix_path, File "c:\users\59740\appdata\local\programs\python\python36\lib\site-packages\pip\req\req_set.py", line 784, in install **kwargs File "c:\users\59740\appdata\local\programs\python\python36\lib\site-packages\pip\req\req_install.py", line 878, in install spinner=spinner, File "c:\users\59740\appdata\local\programs\python\python36\lib\site-packages\pip\utils\__init__.py", line 676, in call_subprocess line = console_to_str(proc.stdout.readline()) File "c:\users\59740\appdata\local\programs\python\python36\lib\site-packages\pip\compat\__init__.py", line 75, in console_to_str return s.decode('utf_8') UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb6 in position 34: invalid start byte During handling of the above exception, another exception occurred: Traceback (most recent call last): File "c:\users\59740\appdata\local\programs\python\python36\lib\site-packages\pip\commands\install.py", line 385, in run requirement_set.cleanup_files() File "c:\users\59740\appdata\local\programs\python\python36\lib\site-packages\pip\req\req_set.py", line 729, in cleanup_files req.remove_temporary_source() File "c:\users\59740\appdata\local\programs\python\python36\lib\site-packages\pip\req\req_install.py", line 977, in remove_temporary_sou rmtree(self.source_dir) File "c:\users\59740\appdata\local\programs\python\python36\lib\site-packages\pip\_vendor\retrying.py", line 49, in wrapped_f return Retrying(*dargs, **dkw).call(f, *args, **kw) File "c:\users\59740\appdata\local\programs\python\python36\lib\site-packages\pip\_vendor\retrying.py", line 212, in call raise attempt.get() File "c:\users\59740\appdata\local\programs\python\python36\lib\site-packages\pip\_vendor\retrying.py", line 247, in get six.reraise(self.value[0], self.value[1], self.value[2]) File "c:\users\59740\appdata\local\programs\python\python36\lib\site-packages\pip\_vendor\six.py", line 686, in reraise raise value File "c:\users\59740\appdata\local\programs\python\python36\lib\site-packages\pip\_vendor\retrying.py", line 200, in call attempt = Attempt(fn(*args, **kwargs), attempt_number, False) File "c:\users\59740\appdata\local\programs\python\python36\lib\site-packages\pip\utils\__init__.py", line 102, in rmtree onerror=rmtree_errorhandler) File "c:\users\59740\appdata\local\programs\python\python36\lib\shutil.py", line 488, in rmtree return _rmtree_unsafe(path, onerror) File "c:\users\59740\appdata\local\programs\python\python36\lib\shutil.py", line 387, in _rmtree_unsafe onerror(os.rmdir, path, sys.exc_info()) File "c:\users\59740\appdata\local\programs\python\python36\lib\site-packages\pip\utils\__init__.py", line 114, in rmtree_errorhandler func(path) PermissionError: [WinError 32] 另一個程序正在使用此文件,進程無法訪問。: 'C:\\Users\\59740\\AppData\\Local\\Temp\\pip-build-1djzmudb\\scrapy' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "c:\users\59740\appdata\local\programs\python\python36\lib\site-packages\pip\basecommand.py", line 215, in main status = self.run(options, args) File "c:\users\59740\appdata\local\programs\python\python36\lib\site-packages\pip\commands\install.py", line 385, in run requirement_set.cleanup_files() File "c:\users\59740\appdata\local\programs\python\python36\lib\site-packages\pip\utils\build.py", line 38, in __exit__ self.cleanup() File "c:\users\59740\appdata\local\programs\python\python36\lib\site-packages\pip\utils\build.py", line 42, in cleanup rmtree(self.name) File "c:\users\59740\appdata\local\programs\python\python36\lib\site-packages\pip\_vendor\retrying.py", line 49, in wrapped_f return Retrying(*dargs, **dkw).call(f, *args, **kw) File "c:\users\59740\appdata\local\programs\python\python36\lib\site-packages\pip\_vendor\retrying.py", line 212, in call raise attempt.get() File "c:\users\59740\appdata\local\programs\python\python36\lib\site-packages\pip\_vendor\retrying.py", line 247, in get six.reraise(self.value[0], self.value[1], self.value[2]) File "c:\users\59740\appdata\local\programs\python\python36\lib\site-packages\pip\_vendor\six.py", line 686, in reraise raise value File "c:\users\59740\appdata\local\programs\python\python36\lib\site-packages\pip\_vendor\retrying.py", line 200, in call attempt = Attempt(fn(*args, **kwargs), attempt_number, False) File "c:\users\59740\appdata\local\programs\python\python36\lib\site-packages\pip\utils\__init__.py", line 102, in rmtree onerror=rmtree_errorhandler) File "c:\users\59740\appdata\local\programs\python\python36\lib\shutil.py", line 488, in rmtree return _rmtree_unsafe(path, onerror) File "c:\users\59740\appdata\local\programs\python\python36\lib\shutil.py", line 378, in _rmtree_unsafe _rmtree_unsafe(fullname, onerror) File "c:\users\59740\appdata\local\programs\python\python36\lib\shutil.py", line 387, in _rmtree_unsafe onerror(os.rmdir, path, sys.exc_info()) File "c:\users\59740\appdata\local\programs\python\python36\lib\site-packages\pip\utils\__init__.py", line 114, in rmtree_errorhandler func(path) PermissionError: [WinError 32] 另一個程序正在使用此文件,進程無法訪問。: 'C:\\Users\\59740\\AppData\\Local\\Temp\\pip-build-1djzmudb\\scrapy
解決方法:
打開
c:\program files\python36\lib\site-packages\pip\compat\__init__.py
找到
return s.decode('utf_8')
並將其改為
return s.decode('cp936')
這個是編碼問題,雖然py3統一用utf-8了。但windows下的終端顯示用的還是gbk編碼。
常見問題七:缺少win32
缺少模塊,會顯示如下錯誤:
ModuleNotFoundError: No module named 'win32api'
安裝win32,然后根據自己的電腦安裝(我的是python 3.6,操作系統是64位,名稱中間的cp36是python3.6的意思,amd64是python的位數)
安裝指令如下:
pip install pywin32-221-cp36-cp36m-win_amd64.whl
最后安裝scrapy
在cmd中輸入如下代碼
pip install scrapy
ok,終於經過折騰完成這個scrapy框架的安裝,真的是經歷九九八十一難。
現在總結一下安裝scrapy的大致順序:
基本一個好的anaconda環境,我們安裝以下面順序即可: 1,pip install wheel 2,下載對應版本的twisted,然后 pip install 下載好的框架.whl 3,pip install pywin32 4,pip install scrapy
復雜問題:找不到指定模組
報錯如下:
網上找了很多方法,都沒有解決,很煩。
於是我將安裝的東西全部卸載,依次卸載lxml,twisted,pywin32。如果運氣好的話,再次安裝就OK了。
如果運氣不好的話,我們需要更新一個東西,那就是openssl的版本。
conda install openssl=1.0.2p
這樣就OK了。
參考: https://www.cnblogs.com/little-orangeaaa/p/10259973.html
scrapy框架常見命令
查看所有命令
scrapy -h
查看幫助信息
scapy --help
查看版本信息
(venv)ql@ql:~$ scrapy version Scrapy 1.1.2 (venv)ql@ql:~$ (venv)ql@ql:~$ scrapy version -v Scrapy : 1.1.2 lxml : 3.6.4.0 libxml2 : 2.9.4 Twisted : 16.4.0 Python : 2.7.12 (default, Jul 1 2016, 15:12:24) - [GCC 5.4.0 20160609] pyOpenSSL : 16.1.0 (OpenSSL 1.0.2g-fips 1 Mar 2016) Platform : Linux-4.4.0-36-generic-x86_64-with-Ubuntu-16.04-xenial (venv)ql@ql:~$
新建一個工程
scrapy startproject spider_name
構建爬蟲genspider
(generator spider)(一個工程中可以存在多個spider, 但是名字必須唯一)
scrapy genspider name domain #如: #scrapy genspider sohu sohu.org
查看當前項目內有多少爬蟲
scrapy list
view
使用瀏覽器打開網頁
scrapy view http://www.baidu.com
shell命令, 進入scrpay交互環境
#進入該url的交互環境 scrapy shell http://www.dmoz.org/Computers/Programming/Languages/Python/Books/
之后便進入交互環境,我們主要使用這里面的response
命令, 例如可以使用
response.xpath() #括號里直接加xpath路徑
runspider
命令用於直接運行創建的爬蟲, 並不會運行整個項目
scrapy runspider 爬蟲名稱