Python爬蟲學習筆記——防豆瓣反爬蟲

本文轉載自查看原文 2016-01-14 11:41 6176 爬蟲/ Python

開始慢慢測試爬蟲以后會發現IP老被封，原因應該就是單位時間里面訪問次數過多，雖然最簡單的方法就是降低訪問頻率，但是又不想降低訪問頻率怎么辦呢？查了一下最簡單的方法就是使用轉輪代理IP，網上找了一些方法和免費的代理IP，嘗試了一下，可以成功，其中IP代理我使用的是http://www.xicidaili.com/nn/

獲取Proxies的代碼如下：

 1 for page in range(1,5):
 2     IPurl = 'http://www.xicidaili.com/nn/%s' %page
 3     rIP=requests.get(IPurl,headers=headers)
 4     IPContent=rIP.text
 5     soupIP = BeautifulSoup(IPContent,"html5lib")
 6     trs = soupIP.find_all('tr')
 7     for tr in trs[1:]:
 8         tds = tr.find_all('td')
 9         ip = tds[2].text.strip()
10         port = tds[3].text.strip()
11         protocol = tds[6].text.strip()
12         if protocol == 'HTTP':
13             httpResult = 'http://' + ip + ':' + port
14         elif protocol =='HTTPS':
15             httpsResult = 'https://' + ip + ':' + port

由於Requests是可以直接在訪問時候加上proxies的，所以我直接得到的格式使用的是proxies中的格式，requests庫文檔中，添加代理的格式如下：

import requests

proxies = {
  "http": "http://10.10.1.10:3128",
  "https": "http://10.10.1.10:1080",
}

requests.get("http://example.org", proxies=proxies)

測試可以使用http://www.ip.cn測試訪問時的本地IP，代碼如下：

 1 import requests
 2 from bs4 import BeautifulSoup
 3 import html5lib
 4 headers = {
 5 "user-agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36",
 6 }
 7 proxies ={
 8     "http":'http://122.193.14.102:80',
 9     "https":"http://120.203.18.33:8123"
10 }
11 r = requests.get('http://www.ip.cn',headers=headers,proxies=proxies)
12 content = r.text
13 ip=re.search(r'code.(.*?)..code',content)
14 print (ip.group(1))

上面的代理需要根據自己實際可用代理替換。

參考鏈接：http://docs.python-requests.org/zh_CN/latest/user/advanced.html

http://www.oschina.net/code/snippet_2463131_51169

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 【Python爬蟲】學習筆記 -- post請求的方法(Cookie反爬) python爬蟲入門筆記：scrapy爬豆瓣 Python爬蟲（3）豆瓣登錄 requests發送數據和對反爬蟲的處理 ----------python的爬蟲學習 Python爬蟲——反爬 python DHT爬蟲學習筆記 Python爬蟲學習筆記（七）——Ajax 防采集與反爬蟲常見的策略以及解決思路《Python爬蟲學習系列教程》學習筆記 python爬蟲-靜態爬取豆瓣評論