代理的設置
在urllib庫中使用代理,代碼如下:
from urllib.request import ProxyHandler,build_opener from urllib.error import URLError proxy = "113.116.50.182:808" proxy_handler = ProxyHandler({ "http":"http://"+proxy, "https":"https://"+proxy, }) opener = build_opener(proxy_handler) try: response = opener.open("http://httpbin.org/ip") print(response.read().decode()) except URLError as e: print("ip不能用")
顯示為下面的情況,說明代理設置成功:
{ "origin": "113.116.50.182, 113.116.50.182" }
對於需要認證的代理,,只需要改變proxy變量,在代理前面加入代理認證的用戶名密碼即可:"username:password@113.116.50.182"
from urllib.request import ProxyHandler,build_opener from urllib.error import URLError proxy = "username:password@113.116.50.182:808" proxy_handler = ProxyHandler({ "http":"http://"+proxy, "https":"https://"+proxy, }) opener = build_opener(proxy_handler) try: response = opener.open("http://httpbin.org/ip") print(response.read().decode()) except URLError as e: print("ip不能用")
如果遇到了socks代理服務器:
采用socks協議的代理服務器就是SOCKS服務器,是一種通用的代理服務器。Socks是個電路級的底層網關,是DavidKoblas在1990年開發的,此后就一直作為Internet RFC標准的開放標准。Socks 不要求應用程序遵循特定的操作系統平台,Socks 代理與應用層代理、 HTTP 層代理不同,Socks 代理只是簡單地傳遞數據包,而不必關心是何種應用協議(比如FTP、HTTP和NNTP請求)。所以,Socks代理比其他應用層代理要快得多。
代碼設置如下:
import socks import socket from urllib import request from urllib.error import URLError socks.set_default_proxy(socks.SOCKS5,"113.116.50.182",807) socket.socket = socks.socksocket try: response = request.urlopen("http://httpbin.org/ip") print(response.read().decode()) except URLError as e: print("ip不能用")
requests庫代理設置
import requests proxy = "113.116.50.182:808" proxies = { "http":"http://"+proxy, "https":"https://"+proxy, } try: response = requests.get("http://httpbin.org/ip",proxies=proxies) print(response.text) except requests.exceptions.ConnectionError as e: print("Error",e.args)
比urllib中使用代理設置要簡單的多,當然這里對於需要認證的代理,同樣使用proxy = “username:password@113.116.50.182:808”即可,這里不再演示
對於requests庫中使用socks5代理,設置如下:
import requests import socks import socket socks.set_default_proxy(socks.SOCKS5,"113.116.50.182",807) socket.socket = socks.socksocket try: response = requests.get("http://httpbin.org/ip") print(response.text) except requests.exceptions.ConnectionError as e: print("Error",e.args)
Selenium中設置代理
鑒於PhantomJS無界面瀏覽器已經無人維護,這里只演示有界面瀏覽器Chrome
from selenium import webdriver proxy = "113.116.50.182:808" chromeOptions = webdriver.ChromeOptions() chromeOptions.add_argument('--proxy-server=http://'+proxy) driver = webdriver.Chrome(executable_path=r"C:\Users\Administrator\Downloads\chromedriver.exe",options=chromeOptions) driver.get("http://httpbin.org/ip") print(driver.page_source)
爬取結果如下:
<html xmlns="http://www.w3.org/1999/xhtml"><head></head><body><pre style="word-wrap: break-word; white-space: pre-wrap;">{ "origin": "113.116.50.182, 113.116.50.182" } </pre></body></html>
注意:chromeOptions目前需要使用options代替
對於在Selenium中使用認證代理,稍微麻煩一些,以后直接修改以下代碼即可
from selenium import webdriver from selenium.webdriver.chrome.options import Options import zipfile ip = '113.116.50.182' port = 808 username = 'xxxx' password = 'xxxx' manifest_json = """ { "version": "1.0.0", "manifest_version": 2, "name": "Chrome Proxy", "permissions": [ "proxy", "tabs", "unlimitedStorage", "storage", "<all_urls>", "webRequest", "webRequestBlocking" ], "background": { "scripts": ["background.js"] } } """ background_js = """ var config = { mode: "fixed_servers", rules: { singleProxy: { scheme: "http", host: "%(ip)s", port: %(port)s } } } chrome.proxy.settings.set({value: config, scope: "regular"}, function() {}); function callbackFn(details) { return { authCredentials: { username: "%(username)s", password: "%(password)s" } } } chrome.webRequest.onAuthRequired.addListener( callbackFn, {urls: ["<all_urls>"]}, ['blocking'] ) """ % {'ip': ip, 'port': port, 'username': username, 'password': password} plugin_file = 'proxy_auth_plugin.zip' with zipfile.ZipFile(plugin_file, 'w') as zp: zp.writestr("manifest.json", manifest_json) zp.writestr("background.js", background_js) chrome_options = Options() chrome_options.add_argument("--start-maximized") chrome_options.add_extension(plugin_file) browser = webdriver.Chrome(executable_path=r"C:\Users\Administrator\Downloads\chromedriver.exe",options=chrome_options) browser.get('http://httpbin.org/ip')