代理的设置
在urllib库中使用代理,代码如下:
from urllib.request import ProxyHandler,build_opener from urllib.error import URLError proxy = "113.116.50.182:808" proxy_handler = ProxyHandler({ "http":"http://"+proxy, "https":"https://"+proxy, }) opener = build_opener(proxy_handler) try: response = opener.open("http://httpbin.org/ip") print(response.read().decode()) except URLError as e: print("ip不能用")
显示为下面的情况,说明代理设置成功:
{ "origin": "113.116.50.182, 113.116.50.182" }
对于需要认证的代理,,只需要改变proxy变量,在代理前面加入代理认证的用户名密码即可:"username:password@113.116.50.182"
from urllib.request import ProxyHandler,build_opener from urllib.error import URLError proxy = "username:password@113.116.50.182:808" proxy_handler = ProxyHandler({ "http":"http://"+proxy, "https":"https://"+proxy, }) opener = build_opener(proxy_handler) try: response = opener.open("http://httpbin.org/ip") print(response.read().decode()) except URLError as e: print("ip不能用")
如果遇到了socks代理服务器:
采用socks协议的代理服务器就是SOCKS服务器,是一种通用的代理服务器。Socks是个电路级的底层网关,是DavidKoblas在1990年开发的,此后就一直作为Internet RFC标准的开放标准。Socks 不要求应用程序遵循特定的操作系统平台,Socks 代理与应用层代理、 HTTP 层代理不同,Socks 代理只是简单地传递数据包,而不必关心是何种应用协议(比如FTP、HTTP和NNTP请求)。所以,Socks代理比其他应用层代理要快得多。
代码设置如下:
import socks import socket from urllib import request from urllib.error import URLError socks.set_default_proxy(socks.SOCKS5,"113.116.50.182",807) socket.socket = socks.socksocket try: response = request.urlopen("http://httpbin.org/ip") print(response.read().decode()) except URLError as e: print("ip不能用")
requests库代理设置
import requests proxy = "113.116.50.182:808" proxies = { "http":"http://"+proxy, "https":"https://"+proxy, } try: response = requests.get("http://httpbin.org/ip",proxies=proxies) print(response.text) except requests.exceptions.ConnectionError as e: print("Error",e.args)
比urllib中使用代理设置要简单的多,当然这里对于需要认证的代理,同样使用proxy = “username:password@113.116.50.182:808”即可,这里不再演示
对于requests库中使用socks5代理,设置如下:
import requests import socks import socket socks.set_default_proxy(socks.SOCKS5,"113.116.50.182",807) socket.socket = socks.socksocket try: response = requests.get("http://httpbin.org/ip") print(response.text) except requests.exceptions.ConnectionError as e: print("Error",e.args)
Selenium中设置代理
鉴于PhantomJS无界面浏览器已经无人维护,这里只演示有界面浏览器Chrome
from selenium import webdriver proxy = "113.116.50.182:808" chromeOptions = webdriver.ChromeOptions() chromeOptions.add_argument('--proxy-server=http://'+proxy) driver = webdriver.Chrome(executable_path=r"C:\Users\Administrator\Downloads\chromedriver.exe",options=chromeOptions) driver.get("http://httpbin.org/ip") print(driver.page_source)
爬取结果如下:
<html xmlns="http://www.w3.org/1999/xhtml"><head></head><body><pre style="word-wrap: break-word; white-space: pre-wrap;">{ "origin": "113.116.50.182, 113.116.50.182" } </pre></body></html>
注意:chromeOptions目前需要使用options代替
对于在Selenium中使用认证代理,稍微麻烦一些,以后直接修改以下代码即可
from selenium import webdriver from selenium.webdriver.chrome.options import Options import zipfile ip = '113.116.50.182' port = 808 username = 'xxxx' password = 'xxxx' manifest_json = """ { "version": "1.0.0", "manifest_version": 2, "name": "Chrome Proxy", "permissions": [ "proxy", "tabs", "unlimitedStorage", "storage", "<all_urls>", "webRequest", "webRequestBlocking" ], "background": { "scripts": ["background.js"] } } """ background_js = """ var config = { mode: "fixed_servers", rules: { singleProxy: { scheme: "http", host: "%(ip)s", port: %(port)s } } } chrome.proxy.settings.set({value: config, scope: "regular"}, function() {}); function callbackFn(details) { return { authCredentials: { username: "%(username)s", password: "%(password)s" } } } chrome.webRequest.onAuthRequired.addListener( callbackFn, {urls: ["<all_urls>"]}, ['blocking'] ) """ % {'ip': ip, 'port': port, 'username': username, 'password': password} plugin_file = 'proxy_auth_plugin.zip' with zipfile.ZipFile(plugin_file, 'w') as zp: zp.writestr("manifest.json", manifest_json) zp.writestr("background.js", background_js) chrome_options = Options() chrome_options.add_argument("--start-maximized") chrome_options.add_extension(plugin_file) browser = webdriver.Chrome(executable_path=r"C:\Users\Administrator\Downloads\chromedriver.exe",options=chrome_options) browser.get('http://httpbin.org/ip')