python爬蟲之Scrapy 使用代理配置

本文轉載自查看原文 2016-08-22 11:23 4546 python/ Scrapy

轉載自：http://www.python_tab.com/html/2014/pythonweb_0326/724.html

在爬取網站內容的時候，最常遇到的問題是：網站對IP有限制，會有防抓取功能，最好的辦法就是IP輪換抓取（加代理）

下面來說一下Scrapy如何配置代理，進行抓取

1.在Scrapy工程下新建“middlewares.py”

# Importing base64 library because we'll need it ONLY in case if the proxy we are going to use requires authentication
import base64 
# Start your middleware class
class ProxyMiddleware(object):
    # overwrite process request
    def process_request(self, request, spider):
        # Set the location of the proxy
        request.meta['proxy'] = "http://YOUR_PROXY_IP:PORT"
  
        # Use the following lines if your proxy requires authentication
        proxy_user_pass = "USERNAME:PASSWORD"
        # setup basic authentication for the proxy
        encoded_user_pass = base64.encodestring(proxy_user_pass)
        request.headers['Proxy-Authorization'] = 'Basic ' + encoded_user_pass

2.在項目配置文件里(./pythontab/settings.py)添加

DOWNLOADER_MIDDLEWARES = {
    'scrapy.contrib.downloadermiddleware.httpproxy.HttpProxyMiddleware': 110,
    'pythontab.middlewares.ProxyMiddleware': 100,
}

完畢。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 python爬蟲scrapy之rules的基本使用 python爬蟲之scrapy的pipeline的使用 python爬蟲-代理的使用 Python scrapy爬蟲框架常用setting配置 Python爬蟲之Scrapy框架的UA池和代理池 python爬蟲scrapy之downloader_middleware設置proxy代理 python使用代理爬蟲例子 Python3 Scrapy爬蟲框架-使用 Python之Scrapy爬蟲框架安裝及簡單使用 python網絡爬蟲（2）——scrapy框架的基礎使用