Scrapy設置代理Proxy - 轉


一. From: http://www.sharejs.com/codes/Python/8309

 

1.在Scrapy工程下新建“middlewares.py”

 1 # Importing base64 library because we'll need it ONLY in case if the proxy we are going to use requires authentication
 2 import base64
 3  
 4 # Start your middleware class
 5 class ProxyMiddleware(object):
 6     # overwrite process request
 7     def process_request(self, request, spider):
 8         # Set the location of the proxy
 9         request.meta['proxy'] = "http://YOUR_PROXY_IP:PORT"
10  
11         # Use the following lines if your proxy requires authentication
12         proxy_user_pass = "USERNAME:PASSWORD"
13         # setup basic authentication for the proxy
14         encoded_user_pass = base64.encodestring(proxy_user_pass)
15         request.headers['Proxy-Authorization'] = 'Basic ' + encoded_user_pass
16 
17 
18 #該代碼片段來自於: http://www.sharejs.com/codes/Python/8309

2.在項目配置文件里(./project_name/settings.py)添加

1 DOWNLOADER_MIDDLEWARES = {
2     'scrapy.contrib.downloadermiddleware.httpproxy.HttpProxyMiddleware': 110,
3     'project_name.middlewares.ProxyMiddleware': 100,
4 }

只要兩步,現在請求就是通過代理的了。測試一下^_^

 1 from scrapy.spider import BaseSpider
 2 from scrapy.contrib.spiders import CrawlSpider, Rule
 3 from scrapy.http import Request
 4  
 5 class TestSpider(CrawlSpider):
 6     name = "test"
 7     domain_name = "whatismyip.com"
 8     # The following url is subject to change, you can get the last updated one from here :
 9     # http://www.whatismyip.com/faq/automation.asp
10     start_urls = ["http://xujian.info"]
11  
12     def parse(self, response):
13         open('test.html', 'wb').write(response.body)
14 #該代碼片段來自於: http://www.sharejs.com/codes/Python/8309

二.From: http://blog.csdn.net/haipengdai/article/details/50972983

http://stackoverflow.com/questions/4710483/scrapy-and-proxies

增加文件middlewares.py放置在setting.py平行的目錄下

 1 import base64
 2 class ProxyMiddleware(object):
 3 # overwrite process request
 4 def process_request(self, request, spider):
 5     # Set the location of the proxy
 6     request.meta['proxy'] = "http://YOUR_PROXY_IP:PORT"
 7 
 8     # Use the following lines if your proxy requires authentication
 9     proxy_user_pass = "USERNAME:PASSWORD"
10     # setup basic authentication for the proxy
11     encoded_user_pass = base64.b64encode(proxy_user_pass)
12     request.headers['Proxy-Authorization'] = 'Basic ' + encoded_user_pass

很多網上的答案使用base64.encodestring來編碼proxy_user_pass,有一種情況,當username太長的時候,會出現錯誤,所以推薦使用b64encode編碼方式

然后在setting.py中,在DOWNLOADER_MIDDLEWARES中把它打開,projectname.middlewares.ProxyMiddleware: 1就可以了

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM