python網絡爬蟲(1)——安裝scrapy框架的常見問題及其解決方法


  Scrapy是為了爬取網站數據而編寫的一款應用框架,出名,強大。所謂的框架其實就是一個集成了相應的功能且具有很強通用性的項目模板。

  其實在Linux和 Mac安裝,就簡單的pip命令即可:

pip install wheel

  但是在Windows上安裝卻有很多坑,所以下面小編講一下自己在windows10安裝及配置Scrapy中遇到的一些坑及其解決的方法,現在總結如下,希望對大家有所幫助。

  包的下載地址:https://www.lfd.uci.edu/~gohlke/pythonlibs/

常見問題一:pip版本需要升級

       如果你的pip版本比較老,可能在安裝的過程中需要更新對應的pip版本,所以最好通過指令升級一下pip

       升級指令如下(這是在cmd中操作):

python -m pip install  --upgrade  pip

  升級完成后,這一類問題就解決了。

常見問題二:安裝wheel

pip install  wheel

        如果未安裝wheel,使用該命令可以直接安裝wheel,如果已經安裝了,使用該命令則會顯示如下圖所出信息,不會重復進行安裝

Requirement already satisfied: wheel in d:\python3\lib\site-packages

常見問題三:缺少lxml

          順利安裝完成wheel,到這里對應的.whl文件,注意別改文件名,然后下載如下的xlml文件,我們可以在LFD中下載對應版本的lxml,如下(我的是windows 64位操作系統,python版本是3.6)

   下載之后,進入cmd命令行安裝好對應的whl文件:

pip install lxml-4.1.1-cp36-cp36m-win_amd64.whl

       未安裝的,可以直接安裝,已經安裝的會出現如下代碼表示成功

Requirement already satisfied: lxml==4.1.1 from file:///D:/lxml-4.1.1-cp36-cp36m-win_amd64.whl in d:\python3\lib\site-packages

 

常見問題四:路徑沖突

Error in sitecustomize; set PYTHONVERBOSE for traceback:
AttributeError: module 'sys' has no attribute 'setdefaultencoding'

  因為sys.path 中多了python27的site-package沖突  

  到“…/local/lib/python3.6/site-packages/“目錄下(目錄因人而已),刪除里面的路徑即可

python -v homebrew.pth

 

常見問題五:缺少Twisted

       安裝Twisted,然后根據自己的電腦安裝(我的是python 3.6,操作系統是64位,名稱中間的cp36是python3.6的意思,amd64是python的位數)

 

 下載好后,安裝命令如下:

 pip install  Twisted-17.9.0-cp36-cp36m-win_amd64.whl

未安裝的,可以直接安裝,安裝的則顯示成功,如下:

Successfully installed Twisted-17.9.0

常見問題六:出現UnicodeDecodeError

(由於小編已經踩過坑了,所以這些代碼都是網上找到的相似代碼,大體內容相似,問題一致)

Exception:
  Traceback (most recent call last):
    File "c:\program files\python36\lib\site-packages\pip\compat\__init__.py", line 73, in console_to_str
      return s.decode(sys.__stdout__.encoding)
 UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb6 in position 34: invalid start byte
 
 During handling of the above exception, another exception occurred:
 
 Traceback (most recent call last):
   File "c:\program files\python36\lib\site-packages\pip\basecommand.py", line 215, in main
     status = self.run(options, args)
   File "c:\program files\python36\lib\site-packages\pip\commands\install.py", line 342, in run
     prefix=options.prefix_path,
   File "c:\program files\python36\lib\site-packages\pip\req\req_set.py", line 784, in install
     **kwargs
   File "c:\program files\python36\lib\site-packages\pip\req\req_install.py", line 878, in install
     spinner=spinner,
   File "c:\program files\python36\lib\site-packages\pip\utils\__init__.py", line 676, in call_subprocess
     line = console_to_str(proc.stdout.readline())
   File "c:\program files\python36\lib\site-packages\pip\compat\__init__.py", line 75, in console_to_str
     return s.decode('utf_8')
 UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb6 in position 34: invalid start byte

   或者下面error:

Exception:
Traceback (most recent call last):
  File "c:\users\59740\appdata\local\programs\python\python36\lib\site-packages\pip\compat\__init__.py", line 73, in console_to_str
    return s.decode(sys.__stdout__.encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb6 in position 34: invalid start byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\users\59740\appdata\local\programs\python\python36\lib\site-packages\pip\commands\install.py", line 342, in run
    prefix=options.prefix_path,
  File "c:\users\59740\appdata\local\programs\python\python36\lib\site-packages\pip\req\req_set.py", line 784, in install
    **kwargs
  File "c:\users\59740\appdata\local\programs\python\python36\lib\site-packages\pip\req\req_install.py", line 878, in install
    spinner=spinner,
  File "c:\users\59740\appdata\local\programs\python\python36\lib\site-packages\pip\utils\__init__.py", line 676, in call_subprocess
    line = console_to_str(proc.stdout.readline())
  File "c:\users\59740\appdata\local\programs\python\python36\lib\site-packages\pip\compat\__init__.py", line 75, in console_to_str
    return s.decode('utf_8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb6 in position 34: invalid start byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\users\59740\appdata\local\programs\python\python36\lib\site-packages\pip\commands\install.py", line 385, in run
    requirement_set.cleanup_files()
  File "c:\users\59740\appdata\local\programs\python\python36\lib\site-packages\pip\req\req_set.py", line 729, in cleanup_files
    req.remove_temporary_source()
  File "c:\users\59740\appdata\local\programs\python\python36\lib\site-packages\pip\req\req_install.py", line 977, in remove_temporary_sou
    rmtree(self.source_dir)
  File "c:\users\59740\appdata\local\programs\python\python36\lib\site-packages\pip\_vendor\retrying.py", line 49, in wrapped_f
    return Retrying(*dargs, **dkw).call(f, *args, **kw)
  File "c:\users\59740\appdata\local\programs\python\python36\lib\site-packages\pip\_vendor\retrying.py", line 212, in call
    raise attempt.get()
  File "c:\users\59740\appdata\local\programs\python\python36\lib\site-packages\pip\_vendor\retrying.py", line 247, in get
    six.reraise(self.value[0], self.value[1], self.value[2])
  File "c:\users\59740\appdata\local\programs\python\python36\lib\site-packages\pip\_vendor\six.py", line 686, in reraise
    raise value
  File "c:\users\59740\appdata\local\programs\python\python36\lib\site-packages\pip\_vendor\retrying.py", line 200, in call
    attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
  File "c:\users\59740\appdata\local\programs\python\python36\lib\site-packages\pip\utils\__init__.py", line 102, in rmtree
    onerror=rmtree_errorhandler)
  File "c:\users\59740\appdata\local\programs\python\python36\lib\shutil.py", line 488, in rmtree
    return _rmtree_unsafe(path, onerror)
  File "c:\users\59740\appdata\local\programs\python\python36\lib\shutil.py", line 387, in _rmtree_unsafe
    onerror(os.rmdir, path, sys.exc_info())
  File "c:\users\59740\appdata\local\programs\python\python36\lib\site-packages\pip\utils\__init__.py", line 114, in rmtree_errorhandler
    func(path)
PermissionError: [WinError 32] 另一個程序正在使用此文件,進程無法訪問。: 'C:\\Users\\59740\\AppData\\Local\\Temp\\pip-build-1djzmudb\\scrapy'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\users\59740\appdata\local\programs\python\python36\lib\site-packages\pip\basecommand.py", line 215, in main
    status = self.run(options, args)
  File "c:\users\59740\appdata\local\programs\python\python36\lib\site-packages\pip\commands\install.py", line 385, in run
    requirement_set.cleanup_files()
  File "c:\users\59740\appdata\local\programs\python\python36\lib\site-packages\pip\utils\build.py", line 38, in __exit__
    self.cleanup()
  File "c:\users\59740\appdata\local\programs\python\python36\lib\site-packages\pip\utils\build.py", line 42, in cleanup
    rmtree(self.name)
  File "c:\users\59740\appdata\local\programs\python\python36\lib\site-packages\pip\_vendor\retrying.py", line 49, in wrapped_f
    return Retrying(*dargs, **dkw).call(f, *args, **kw)
  File "c:\users\59740\appdata\local\programs\python\python36\lib\site-packages\pip\_vendor\retrying.py", line 212, in call
    raise attempt.get()
  File "c:\users\59740\appdata\local\programs\python\python36\lib\site-packages\pip\_vendor\retrying.py", line 247, in get
    six.reraise(self.value[0], self.value[1], self.value[2])
  File "c:\users\59740\appdata\local\programs\python\python36\lib\site-packages\pip\_vendor\six.py", line 686, in reraise
    raise value
  File "c:\users\59740\appdata\local\programs\python\python36\lib\site-packages\pip\_vendor\retrying.py", line 200, in call
    attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
  File "c:\users\59740\appdata\local\programs\python\python36\lib\site-packages\pip\utils\__init__.py", line 102, in rmtree
    onerror=rmtree_errorhandler)
  File "c:\users\59740\appdata\local\programs\python\python36\lib\shutil.py", line 488, in rmtree
    return _rmtree_unsafe(path, onerror)
  File "c:\users\59740\appdata\local\programs\python\python36\lib\shutil.py", line 378, in _rmtree_unsafe
    _rmtree_unsafe(fullname, onerror)
  File "c:\users\59740\appdata\local\programs\python\python36\lib\shutil.py", line 387, in _rmtree_unsafe
    onerror(os.rmdir, path, sys.exc_info())
  File "c:\users\59740\appdata\local\programs\python\python36\lib\site-packages\pip\utils\__init__.py", line 114, in rmtree_errorhandler
    func(path)
PermissionError: [WinError 32] 另一個程序正在使用此文件,進程無法訪問。: 'C:\\Users\\59740\\AppData\\Local\\Temp\\pip-build-1djzmudb\\scrapy

  

解決方法:

  打開 

c:\program files\python36\lib\site-packages\pip\compat\__init__.py 

   找到

return s.decode('utf_8')


並將其改為

return s.decode('cp936')

  
這個是編碼問題,雖然py3統一用utf-8了。但windows下的終端顯示用的還是gbk編碼。

 常見問題七:缺少win32

  缺少模塊,會顯示如下錯誤:

ModuleNotFoundError: No module named 'win32api'

  安裝win32,然后根據自己的電腦安裝(我的是python 3.6,操作系統是64位,名稱中間的cp36是python3.6的意思,amd64是python的位數)

  安裝指令如下:

pip install pywin32-221-cp36-cp36m-win_amd64.whl

 

最后安裝scrapy

  在cmd中輸入如下代碼

pip install scrapy

   ok,終於經過折騰完成這個scrapy框架的安裝,真的是經歷九九八十一難。

  現在總結一下安裝scrapy的大致順序:

基本一個好的anaconda環境,我們安裝以下面順序即可:

1,pip install wheel

2,下載對應版本的twisted,然后  pip install   下載好的框架.whl

3,pip install pywin32

4,pip install scrapy

  

復雜問題:找不到指定模組

  報錯如下:

   網上找了很多方法,都沒有解決,很煩。

  於是我將安裝的東西全部卸載,依次卸載lxml,twisted,pywin32。如果運氣好的話,再次安裝就OK了。

  如果運氣不好的話,我們需要更新一個東西,那就是openssl的版本。

conda install openssl=1.0.2p

  這樣就OK了。

 

  參考: https://www.cnblogs.com/little-orangeaaa/p/10259973.html

 

scrapy框架常見命令

  查看所有命令

scrapy -h

  查看幫助信息

scapy --help

  查看版本信息

(venv)ql@ql:~$ scrapy version
Scrapy 1.1.2
(venv)ql@ql:~$ 
(venv)ql@ql:~$ scrapy version -v
Scrapy    : 1.1.2
lxml      : 3.6.4.0
libxml2   : 2.9.4
Twisted   : 16.4.0
Python    : 2.7.12 (default, Jul  1 2016, 15:12:24) - [GCC 5.4.0 20160609]
pyOpenSSL : 16.1.0 (OpenSSL 1.0.2g-fips  1 Mar 2016)
Platform  : Linux-4.4.0-36-generic-x86_64-with-Ubuntu-16.04-xenial
(venv)ql@ql:~$ 

  新建一個工程

scrapy startproject spider_name

  構建爬蟲genspider(generator spider)(一個工程中可以存在多個spider, 但是名字必須唯一)

scrapy genspider name domain
#如:
#scrapy genspider sohu sohu.org

  查看當前項目內有多少爬蟲

scrapy list

  view使用瀏覽器打開網頁

scrapy view http://www.baidu.com

  shell命令, 進入scrpay交互環境

#進入該url的交互環境
scrapy shell http://www.dmoz.org/Computers/Programming/Languages/Python/Books/

  之后便進入交互環境,我們主要使用這里面的response命令, 例如可以使用

response.xpath()    #括號里直接加xpath路徑

  runspider命令用於直接運行創建的爬蟲, 並不會運行整個項目

scrapy runspider 爬蟲名稱

  


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM