爬蟲—代理的使用


使用代理IP

一,requests使用代理

  requests的代理需要構造一個字典,然后通過設置proxies參數即可。

import requests

proxy = '60.186.9.233'
proxies = {
    'http': 'http://' + proxy,
    'https': 'https://' + proxy
}
try:
    res = requests.get('http://httpbin.org/get', proxies=proxies)
    print(res.text)
except requests.exceptions.ConnectionError as e:
    print('error', e.args)

運行結果:

{
  "args": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.18.4"
  }, 
  "origin": "60.186.9.233", 
  "url": "https://httpbin.org/get"
}

  其運行結果的origin是代理的IP,說明代理設置成功。如果代理需要認證,再代理的前面加上用戶名密碼即可。

proxy = 'username:password@60.186.9.233'

二,Selenium使用代理

  Selenium同樣可以設置代理,一種是有界面瀏覽器,Chrome為例;另一種是無頭瀏覽器,以PhantomJS為例。

Chrome瀏覽器設置

  通過chrome_options來設置代理,才創建Chrome對象的時候用chrome_options參數傳遞即可。運行代碼會彈出Chrome瀏覽器,訪問連接后看到如下結果。

# chrome代理設置
from selenium import webdriver

proxy = '60.186.9.233'
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--proxy-server=http://' + proxy)
browser = webdriver.Chrome(chrome_options=chrome_options)
res = browser.get('http://httpbin.org/get')
{
  "args": {}, 
  "headers": {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8", 
    "Accept-Encoding": "gzip, deflate", 
    "Accept-Language": "zh-CN,zh;q=0.9", 
    "Host": "httpbin.org", 
    "Upgrade-Insecure-Requests": "1", 
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36"
  }, 
  "origin": "60.186.9.233", 
  "url": "https://httpbin.org/get"
}

 

PhantomJS設置

  使用service_args參數將命令行的一些參數定義為列表,在初始化的時候傳遞給PhantomJS就可以了。

# PhantomJs代理設置
from selenium import webdriver

service_args = [
    '--proxy=60.186.9.233',
    '--proxy-type=http'
]
browser = webdriver.PhantomJS(service_args=service_args)
browser.get('http://httpbin.org/get')
print(browser.page_source)

運行結果:

{
  "args": {}, 
  "headers": {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8", 
    "Accept-Encoding": "gzip, deflate", 
    "Accept-Language": "zh-CN,zh;q=0.9", 
    "Host": "httpbin.org", 
    "Upgrade-Insecure-Requests": "1", 
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36"
  }, 
  "origin": "60.186.9.233", 
  "url": "https://httpbin.org/get"
}

如果需要認證,那么在service_args參數中加入--proxy-auth選項即可。

service_args = [
    '--proxy=60.186.9.233',
    '--proxy-type=http',
    '--proxy-auth=username:password'
]

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM