米撲代理,全球領導的代理品牌,專注代理行業近十年,提供開放、私密、獨享代理,並可免費試用
米撲代理官網:https://proxy.mimvp.com
本文示例,是結合米撲代理的私密、獨享、開放代理,專門研發的示例,
支持 http、https的無密碼、白名單ip、密碼授權三種類型
示例中,用的插件 xpi 請到米撲代理官網,或米撲官方 github 下載
本文,直接給出完整的代碼,都經過嚴格驗證通過,具體請見注釋
本文示例的運行環境:
MacBook Pro MacOS High Sierra Version 10.13.4
Google Chrome Version 63.0.3239.84 (Official Build) (64-bit)
Python 2.7.13 (v2.7.13:a06454b1afa1, Dec 17 2016, 12:39:47)
$ pip list | grep selenium
selenium (3.4.2)
chromedriver 下載地址:http://chromedriver.storage.googleapis.com/index.html
Python + Selenium + Chrome
出錯提示:WebDriverException: 'chromedriver' executable needs to be in PATH
解決方法:
a. 下載 ChromeDriver,其它瀏覽器參見官網說明
b. 復制 chromedrive 文件到 Google Chrome 程序目錄下,或復制到環境變量下
cp chromedrive /usr/local/bin/
各操作系統里的位置路徑可以參考官方Wiki
Python 代碼里創建 webdriver 對象時傳遞 chromedrive 路徑
示例1:MacOS + chrome 環境
chromedriver = "/Applications/Google Chrome.app/Contents/MacOS/chromedriver" browser = webdriver.Chrome(executable_path=chromedriver) # 打開 Chrome 瀏覽器 browser.get(url) content = browser.page_source print("content: " + str(content))
示例2:MacOS + 環境變量
def spider_url_chrome(url): browser = None display = None try: display = Display(visible=0, size=(800, 600)) display.start() chromedriver = '/usr/local/bin/chromedriver' browser = webdriver.Chrome(executable_path=chromedriver) # 打開 Chrome 瀏覽器 browser.get(url) content = browser.page_source print("content: " + str(content)) finally: if browser: browser.quit() if display: display.stop()
Selenium + chromedriver 代理使用,無密碼或已設置白名單ip
## webdriver + chrome + proxy + whiteip (無密碼,或白名單ip授權) ## 米撲代理:https://proxy.mimvp.com def spider_url_chrome_by_whiteip(url): browser = None display = None ## 白名單ip,請見米撲代理會員中心: https://proxy.mimvp.com/usercenter/userinfo.php?p=whiteip mimvp_proxy = { 'ip' : '140.143.62.84', # ip 'port_https' : 62288, # http, https 'port_socks' : 62287, # socks5 'username' : 'mimvp-user', 'password' : 'mimvp-pass' } try: display = Display(visible=0, size=(800, 600)) display.start() chrome_options = Options() # ok chrome_options = webdriver.ChromeOptions() # ok proxy_https_argument = '--proxy-server=http://{ip}:{port}'.format(ip=mimvp_proxy['ip'], port=mimvp_proxy['port_https']) # http, https (無密碼,或白名單ip授權,成功) chrome_options.add_argument(proxy_https_argument) # proxy_socks_argument = '--proxy-server=socks5://{ip}:{port}'.format(ip=mimvp_proxy['ip'], port=mimvp_proxy['port_socks']) # socks5 (無密碼,或白名單ip授權,失敗) # chrome_options.add_argument(proxy_socks_argument) chromedriver = '/usr/local/bin/chromedriver' browser = webdriver.Chrome(executable_path=chromedriver, chrome_options=chrome_options) # 打開 Chrome 瀏覽器 browser.get(url) content = browser.page_source print("content: " + str(content)) finally: if browser: browser.quit() if display: display.stop()
Selenium + chromedriver 代理使用,支持http、https賬號密碼
本示例,采用了米撲代理的用戶名密碼授權
獲取戶名密碼授權,請到米撲代理 - 會員中心 - 白名單ip
1、創建一個zip包,包含以下兩個文件 background.js 和 manifest.json,打包成 proxy.zip
1)background.js
var config = { mode: "fixed_servers", rules: { singleProxy: { scheme: "http", host: "140.143.62.84", port: 19480 }, bypassList: ["mimvp.com"] } }; chrome.proxy.settings.set({value: config, scope: "regular"}, function() {}); function callbackFn(details) { return { authCredentials: { username: "mimvp-user", password: "mimvp-pass" } }; } chrome.webRequest.onAuthRequired.addListener( callbackFn, {urls: ["<all_urls>"]}, ['blocking'] );
注意:上面配置中,需要把代理ip、port、username、password 替換成米撲代理的ip:port、授權用戶名和密碼
2)manifest.json
{ "version": "1.0.0", "manifest_version": 2, "name": "Chrome Proxy", "permissions": [ "proxy", "tabs", "unlimitedStorage", "storage", "<all_urls>", "webRequest", "webRequestBlocking" ], "background": { "scripts": ["background.js"] }, "minimum_chrome_version":"22.0.0" }
說明:上面配置,不需要改動,直接拷貝使用即可
2、添加 proxy.zip 到 chrome 中作為插件
#!/usr/bin/env python # -*- coding:utf-8 -*- from selenium import webdriver from selenium.webdriver.common.proxy import * from selenium.webdriver.chrome.options import Options from pyvirtualdisplay import Display # from xvfbwrapper import Xvfb def spider_url_chrome_by_https(url): browser = None display = None try: display = Display(visible=0, size=(800, 600)) display.start() chrome_options = Options() chrome_options.add_extension("proxy.zip") chromedriver = '/usr/local/bin/chromedriver' browser = webdriver.Chrome(executable_path=chromedriver, chrome_options=chrome_options) # 打開 Chrome 瀏覽器 browser.get(url) content = browser.page_source print("content: " + str(content)) finally: if browser: browser.quit() if display: display.stop() if __name__ == '__main__': url = 'https://ip.cn' url = 'https://mimvp.com/' url = 'https://proxy.mimvp.com/ip.php' # http, https 密碼授權,成功 spider_url_chrome_by_https(url)
3、運行效果,驗證成功
content: <html xmlns="http://www.w3.org/1999/xhtml"><head></head><body>140.143.62.84</body></html>
Selenium + Chrome Diver使用用戶名密碼認證的HTTP代理的方法 (升級版)
默認情況下,Chrome的--proxy-server="http://ip:port"參數不支持設置用戶名和密碼認證。
這樣就使得"Selenium + Chrome Driver"無法使用HTTP Basic Authentication的HTTP代理。
一種變通的方式就是采用IP地址認證,米撲代理提供白名單ip授權,即屬於IP地址認證,詳見米撲代理 - 會員中心 - 白名單ip
但在國內網絡環境下,大多數用戶都采用ADSL形式網絡接入,IP是變化的(ISP動態切換),因此無法采用IP地址綁定認證。
因此,迫切需要找到一種讓Chrome自動實現HTTP代理用戶名密碼認證的方案。
Stackoverflow上有人分享了一種利用 Chrome插件 實現自動代理用戶密碼認證的方案非常不錯,
詳細地址:how-to-override-basic-authentication-in-selenium2-with-java-using-chrome-driver
米撲代理的研發工程師,在該思路的基礎上用Python實現了自動化的Chrome插件創建過程,
即根據指定的代理“username:password@ip:port”實現了自動創建一個Chrome代理插件,
然后就可以在"Selenium + Chrome Driver"中通過安裝該插件實現代理配置功能,
具體代碼如下:
1、創建模板文件夾 Chrome-proxy-helper
如上圖結構,依次創建:
1)創建模板文件夾
Chrome-proxy-helper
2)創建 background.js
vim Chrome-proxy-helper/background.js
var config = { mode: "fixed_servers", rules: { singleProxy: { scheme: "http", host: "mimvp_proxy_host", port: parseInt(mimvp_proxy_port) }, bypassList: ["mimvp.com"] } }; chrome.proxy.settings.set({value: config, scope: "regular"}, function() {}); function callbackFn(details) { return { authCredentials: { username: "mimvp_username", password: "mimvp_password" } }; } chrome.webRequest.onAuthRequired.addListener( callbackFn, {urls: ["<all_urls>"]}, ['blocking'] );
3)創建 manifest.json
vim Chrome-proxy-helper/manifest.json
{ "version": "1.0.0", "manifest_version": 2, "name": "Chrome Proxy", "permissions": [ "proxy", "tabs", "unlimitedStorage", "storage", "<all_urls>", "webRequest", "webRequestBlocking" ], "background": { "scripts": ["background.js"] }, "minimum_chrome_version":"22.0.0" }
2、創建zip打包的函數
在 python 腳本里,創建zip打包的函數
import os, re, time, zipfile from selenium import webdriver def get_chrome_proxy_extension(proxy): """獲取一個Chrome代理擴展,里面配置有指定的代理(帶用戶名密碼認證) proxy - 指定的代理,格式: username:password@ip:port """ # Chrome代理插件的參考模板 https://github.com/RobinDev/Selenium-Chrome-HTTP-Private-Proxy CHROME_PROXY_HELPER_DIR = 'Chrome-proxy-helper' # 自定義目錄名,放在代理項目的當前同一級目錄 # 存儲自定義Chrome代理擴展文件的目錄,一般為當前同一級目錄 # 生成的zip路徑為:chrome-proxy-extensions/mimvp-user_mimvp-pass@140.143.62.84_19480.zip CUSTOM_CHROME_PROXY_EXTENSIONS_DIR = 'chrome-proxy-extensions' m = re.compile('([^:]+):([^\@]+)\@([\d\.]+):(\d+)').search(proxy) if m: # 提取代理的各項參數 username = m.groups()[0] password = m.groups()[1] ip = m.groups()[2] port = m.groups()[3] # 創建一個定制Chrome代理擴展(zip文件) if not os.path.exists(CUSTOM_CHROME_PROXY_EXTENSIONS_DIR): os.mkdir(CUSTOM_CHROME_PROXY_EXTENSIONS_DIR) extension_file_path = os.path.join(CUSTOM_CHROME_PROXY_EXTENSIONS_DIR, '{}.zip'.format(proxy.replace(':', '_'))) # 擴展文件不存在,創建 if not os.path.exists(extension_file_path): zf = zipfile.ZipFile(extension_file_path, mode='w') zf.write(os.path.join(CHROME_PROXY_HELPER_DIR, 'manifest.json'), 'manifest.json') # 替換模板中的代理參數 background_content = open(os.path.join(CHROME_PROXY_HELPER_DIR, 'background.js')).read() background_content = background_content.replace('mimvp_proxy_host', ip) background_content = background_content.replace('mimvp_proxy_port', port) background_content = background_content.replace('mimvp_username', username) background_content = background_content.replace('mimvp_password', password) zf.writestr('background.js', background_content) zf.close() return extension_file_path else: raise Exception('Invalid proxy format. Should be username:password@ip:port')
3、編寫 Python 腳本的使用代理函數
## webdriver + chrome + proxy + https (https密碼授權,自動打包zip) ## 米撲代理:https://proxy.mimvp.com def spider_url_chrome_by_https2(url): browser = None display = None try: display = Display(visible=0, size=(800, 600)) display.start() proxy = 'mimvp-guest:welcome2mimvp@140.143.62.84:19480' chrome_options = Options() chrome_options = webdriver.ChromeOptions() chrome_options.add_extension(get_chrome_proxy_extension(proxy)) chromedriver = '/usr/local/bin/chromedriver' browser = webdriver.Chrome(executable_path=chromedriver, chrome_options=chrome_options) # 打開 Chrome 瀏覽器 browser.get(url) content = browser.page_source print("content: " + str(content)) finally: if browser: browser.quit() if display: display.stop() if __name__ == '__main__': url = 'https://ip.cn' url = 'https://mimvp.com/' url = 'https://proxy.mimvp.com/ip.php' # http, https 密碼授權,成功 spider_url_chrome_by_https2(url)
4、運行結果,驗證成功
content: <html xmlns="http://www.w3.org/1999/xhtml"><head></head><body>140.143.62.84</body></html>
5、小結
通過模板,使用腳本自動創建zip文件,實現了自動動態調用代理,可以充分靈活運用米撲代理了
Selenium + chromedriver 代理使用,不支持 socks5,米撲實測不成功
## webdriver + chrome + proxy + socks (socks密碼授權) ## 米撲代理:https://proxy.mimvp.com def spider_url_chrome_by_socks(url): browser = None display = None ## 白名單ip,請見米撲代理會員中心: https://proxy.mimvp.com/usercenter/userinfo.php?p=whiteip mimvp_proxy = { 'ip' : '140.143.62.84', # ip 'port_https' : 62288, # http, https 'port_socks' : 62289, # socks5 'username' : 'mimvp-user', 'password' : 'mimvp-pass' } try: display = Display(visible=0, size=(800, 600)) display.start() capabilities = dict(DesiredCapabilities.CHROME) capabilities['proxy'] = { 'proxyType' : 'MANUAL', # 'httpProxy' : mimvp_proxy['ip'] + ":" + str(mimvp_proxy['port_https']), # 'sslProxy' : mimvp_proxy['ip'] + ":" + str(mimvp_proxy['port_https']), 'socksProxy' : mimvp_proxy['ip'] + ":" + str(mimvp_proxy['port_socks']), 'ftpProxy' : mimvp_proxy['ip'] + ":" + str(mimvp_proxy['port_https']), 'noProxy' : 'localhost,127.0.0.1', 'class' : "org.openqa.selenium.Proxy", 'autodetect' : False } capabilities['proxy']['socksUsername'] = mimvp_proxy['username'] capabilities['proxy']['socksPassword'] = mimvp_proxy['password'] chromedriver = '/usr/local/bin/chromedriver' browser = webdriver.Chrome(chromedriver, desired_capabilities=capabilities) browser.get(url) content = browser.page_source print("content: " + str(content)) finally: if browser: browser.quit() if display: display.stop()
完整的代理示例,請見米撲代理的使用示例:
https://proxy.mimvp.com/demo2.php (Selenium Python)
更多的代理示例,請見米撲代理的官方github:
https://github.com/mimvp/mimvp-proxy-demo
本文中,測試的代理ip,全部來自米撲代理:
附加說明:
Chrome-proxy-helper 有官方版:
https://github.com/sunboy-2050/Chrome-proxy-helper
Introduction
By default, Chrome use the system proxy setting (IE proxy settings on Windows platform ), but sometime we want to set proxy ONLY for chrome, not the whole system.
Chrome proxy helper extension use Chrome native proxy API to set proxy, support socks5, socks4, http and https protocol and pac script, Fast And Simple.
Features
- support socks4, socks5, http, https proxy settings
- support pac proxy settings
- support bypass list
- support online pac script
- support customer proxy rules
- support proxy authentication
- support extension settings synchronize