python爬蟲-代理的使用


代理的設置

在urllib庫中使用代理,代碼如下:

from urllib.request import ProxyHandler,build_opener
from urllib.error import URLError

proxy = "113.116.50.182:808"
proxy_handler = ProxyHandler({
        "http":"http://"+proxy,
        "https":"https://"+proxy,
})
opener = build_opener(proxy_handler)
try:
        response = opener.open("http://httpbin.org/ip")
        print(response.read().decode())
except URLError as e:
        print("ip不能用")

顯示為下面的情況,說明代理設置成功:

{
  "origin": "113.116.50.182, 113.116.50.182"
}

 

對於需要認證的代理,,只需要改變proxy變量,在代理前面加入代理認證的用戶名密碼即可:"username:password@113.116.50.182"

from urllib.request import ProxyHandler,build_opener
from urllib.error import URLError

proxy = "username:password@113.116.50.182:808"
proxy_handler = ProxyHandler({
        "http":"http://"+proxy,
        "https":"https://"+proxy,
})
opener = build_opener(proxy_handler)
try:
        response = opener.open("http://httpbin.org/ip")
        print(response.read().decode())
except URLError as e:
        print("ip不能用")

 

如果遇到了socks代理服務器:

采用socks協議的代理服務器就是SOCKS服務器,是一種通用的代理服務器。Socks是個電路級的底層網關,是DavidKoblas在1990年開發的,此后就一直作為Internet RFC標准的開放標准。Socks 不要求應用程序遵循特定的操作系統平台,Socks 代理與應用層代理、 HTTP 層代理不同,Socks 代理只是簡單地傳遞數據包,而不必關心是何種應用協議(比如FTP、HTTP和NNTP請求)。所以,Socks代理比其他應用層代理要快得多。

代碼設置如下:

import socks
import socket
from urllib import request
from urllib.error import URLError

socks.set_default_proxy(socks.SOCKS5,"113.116.50.182",807)
socket.socket = socks.socksocket


try:
        response = request.urlopen("http://httpbin.org/ip")
        print(response.read().decode())
except URLError as e:
        print("ip不能用")


 

requests庫代理設置

import requests

proxy = "113.116.50.182:808"
proxies = {
        "http":"http://"+proxy,
        "https":"https://"+proxy,
}
try:
        response = requests.get("http://httpbin.org/ip",proxies=proxies)
        print(response.text)
except requests.exceptions.ConnectionError as e:
        print("Error",e.args)

比urllib中使用代理設置要簡單的多,當然這里對於需要認證的代理,同樣使用proxy = “username:password@113.116.50.182:808”即可,這里不再演示

對於requests庫中使用socks5代理,設置如下:

import requests
import socks
import socket

socks.set_default_proxy(socks.SOCKS5,"113.116.50.182",807)
socket.socket = socks.socksocket

try:
        response = requests.get("http://httpbin.org/ip")
        print(response.text)
except requests.exceptions.ConnectionError as e:
        print("Error",e.args)


 

Selenium中設置代理

鑒於PhantomJS無界面瀏覽器已經無人維護,這里只演示有界面瀏覽器Chrome

from selenium import webdriver

proxy = "113.116.50.182:808"
chromeOptions = webdriver.ChromeOptions()
chromeOptions.add_argument('--proxy-server=http://'+proxy)
driver = webdriver.Chrome(executable_path=r"C:\Users\Administrator\Downloads\chromedriver.exe",options=chromeOptions)

driver.get("http://httpbin.org/ip")
print(driver.page_source)

爬取結果如下:

<html xmlns="http://www.w3.org/1999/xhtml"><head></head><body><pre style="word-wrap: break-word; white-space: pre-wrap;">{
  "origin": "113.116.50.182, 113.116.50.182"
}
</pre></body></html>

注意:chromeOptions目前需要使用options代替

 

對於在Selenium中使用認證代理,稍微麻煩一些,以后直接修改以下代碼即可

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import zipfile

ip = '113.116.50.182'
port = 808
username = 'xxxx'
password = 'xxxx'

manifest_json = """
{
    "version": "1.0.0",
    "manifest_version": 2,
    "name": "Chrome Proxy",
    "permissions": [
        "proxy",
        "tabs",
        "unlimitedStorage",
        "storage",
        "<all_urls>",
        "webRequest",
        "webRequestBlocking"
    ],
    "background": {
        "scripts": ["background.js"]
    }
}
"""

background_js = """
var config = {
        mode: "fixed_servers",
        rules: {
          singleProxy: {
            scheme: "http",
            host: "%(ip)s",
            port: %(port)s
          }
        }
      }
chrome.proxy.settings.set({value: config, scope: "regular"}, function() {});
function callbackFn(details) {
    return {
        authCredentials: {
            username: "%(username)s",
            password: "%(password)s"
        }
    }
}
chrome.webRequest.onAuthRequired.addListener(
            callbackFn,
            {urls: ["<all_urls>"]},
            ['blocking']
)
""" % {'ip': ip, 'port': port, 'username': username, 'password': password}

plugin_file = 'proxy_auth_plugin.zip'
with zipfile.ZipFile(plugin_file, 'w') as zp:
    zp.writestr("manifest.json", manifest_json)
    zp.writestr("background.js", background_js)
chrome_options = Options()
chrome_options.add_argument("--start-maximized")
chrome_options.add_extension(plugin_file)
browser = webdriver.Chrome(executable_path=r"C:\Users\Administrator\Downloads\chromedriver.exe",options=chrome_options)
browser.get('http://httpbin.org/ip')


 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM