1.chrome瀏覽器 headless模式下如何跳過webdriver檢測?
環境:
1.selenium-java
<dependency> <groupId>org.seleniumhq.selenium</groupId> <artifactId>selenium-java</artifactId> <version>3.4.0</version> </dependency>
1.問題描述:
當使用webdriver驅動chrome headless時,若被識別出來為webdriver時,則爬蟲無法繼續采集數據,那么該如何跳過瀏覽器檢測繼續采集數據?
2.如何識別瀏覽器為webdriver?
a. 在Chrome控制台輸入:window.navigator.webdriver,如何是webdriver則為true,否則為undefined
b. 在Java代碼中,只要初始化webdriver的參數中帶 enable-automation,headless,remote-debugging-pipe 中的任意一個參數,就會將AutomationControlledEnabled 設置為true,然后 navigator.h 就會設置webdriver為true
ChromeOptions options = new ChromeOptions(); String[] a = { "enable-automation" }; options.setExperimentalOption("excludeSwitches", a); options.addArguments("--headless");
c. 瀏覽器中的window.navigator.webdriver值來自於navigator.h中的webdriver()方法,當AutomationControlledEnabled為true則webdriver=true
參考chromium的源代碼: https://github.com/chromium/chromium/blob/d7da0240cae77824d1eda25745c4022757499131/third_party/blink/renderer/core/frame/navigator.h
bool webdriver() const {
return RuntimeEnabledFeatures::AutomationControlledEnabled();
}
d. AutomationControlledEnabled什么時候設置成true?
參考chromium的源代碼: https://github.com/chromium/chromium/blob/d7da0240cae77824d1eda25745c4022757499131/content/child/runtime_features.cc
只要啟動參數帶EnableAutomation,Headless,RemoteDebuggingPipe就會標志位AutomationControlled
{wrf::EnableAutomationControlled, switches::kEnableAutomation, true},
{wrf::EnableAutomationControlled, switches::kHeadless, true},
{wrf::EnableAutomationControlled, switches::kRemoteDebuggingPipe, true},
3.如何跳過瀏覽器webdriver檢測?
a. 第一種方式:修改navigator.h 將webdriver改為false, 編譯自己的chromium,這種可以從根本上解決問題.
b. 第二種方式:執行cdp命令,修改webdriver的值為undefined .但是selenium-java-3.4.0版本不支持executeCdpCommand方法.這個時候就需要定制自己的ChromiumDriver,添加executeCdpCommand方法
ChromiumDriver driver = new ChromiumDriver(chromeCaps); HashMap<String, Object> cdpCmd = new HashMap<String, Object>(); cdpCmd.put("source", "Object.defineProperty(navigator, 'webdriver', {get: () => undefined }); "); driver.executeCdpCommand("Page.addScriptToEvaluateOnNewDocument", cdpCmd);
JS命令:Object.defineProperty(navigator, 'webdriver', {get: () => undefined});
參考: https://www.cnblogs.com/scholarscholar/p/14364822.html
https://chromedevtools.github.io/devtools-protocol/tot/Page/#method-addScriptToEvaluateOnNewDocument
c.第二種方式:升級selenium-java到beta版本,selenium-java-4.0.0-beta版本支持executeCdpCommand方法,但是升級selenium-java-4.0.0會有很多依賴錯誤需要處理.
<!-- https://mvnrepository.com/artifact/org.seleniumhq.selenium/selenium-java --> <dependency> <groupId>org.seleniumhq.selenium</groupId> <artifactId>selenium-java</artifactId> <version>4.0.0-beta-4</version> </dependency>
4.selenium-java-3.4.0版本不支持executeCdpCommand方法,定制自己的ChromiumDriver,添加executeCdpCommand方法
<dependency> <groupId>org.seleniumhq.selenium</groupId> <artifactId>selenium-java</artifactId> <version>3.4.0</version> </dependency>
package com.xxx.selenium; import java.util.Map; import org.openqa.selenium.Capabilities; import org.openqa.selenium.WebDriver; import org.openqa.selenium.chrome.ChromeDriverService; import org.openqa.selenium.chrome.ChromeOptions; import org.openqa.selenium.remote.CommandExecutor; import org.openqa.selenium.remote.RemoteWebDriver; import com.google.common.collect.ImmutableMap; public class ChromiumDriver extends RemoteWebDriver { public ChromiumDriver(Capabilities capabilities) { this(new ChromiumDriverCommandExecutor("goog", ChromeDriverService.createDefaultService()), capabilities, ChromeOptions.CAPABILITY); } protected ChromiumDriver(CommandExecutor commandExecutor, Capabilities capabilities, String capabilityKey) { super(commandExecutor, capabilities); } /** * Launches Chrome app specified by id. * * @param id Chrome app id. */ public void launchApp(String id) { execute(ChromiumDriverCommand.LAUNCH_APP, ImmutableMap.of("id", id)); } /** * Execute a Chrome Devtools Protocol command and get returned result. The * command and command args should follow * <a href="https://chromedevtools.github.io/devtools-protocol/">chrome devtools * protocol domains/commands</a>. */ public Map<String, Object> executeCdpCommand(String commandName, Map<String, Object> parameters) { @SuppressWarnings("unchecked") Map<String, Object> toReturn = (Map<String, Object>) getExecuteMethod().execute(ChromiumDriverCommand.EXECUTE_CDP_COMMAND, ImmutableMap.of("cmd", commandName, "params", parameters)); return ImmutableMap.copyOf(toReturn); } @Override public void quit() { super.quit(); } } package com.xxx.selenium; /** * Constants for the ChromiumDriver specific command IDs. */ final class ChromiumDriverCommand { private ChromiumDriverCommand() {} static final String LAUNCH_APP = "launchApp"; static final String GET_NETWORK_CONDITIONS = "getNetworkConditions"; static final String SET_NETWORK_CONDITIONS = "setNetworkConditions"; static final String DELETE_NETWORK_CONDITIONS = "deleteNetworkConditions"; static final String EXECUTE_CDP_COMMAND = "executeCdpCommand"; // Cast Media Router APIs static final String GET_CAST_SINKS = "getCastSinks"; static final String SET_CAST_SINK_TO_USE = "selectCastSink"; static final String START_CAST_TAB_MIRRORING = "startCastTabMirroring"; static final String GET_CAST_ISSUE_MESSAGE = "getCastIssueMessage"; static final String STOP_CASTING = "stopCasting"; static final String SET_PERMISSION = "setPermission"; } package com.xxx.selenium; import static java.util.Collections.unmodifiableMap; import java.util.HashMap; import java.util.Map; import org.openqa.selenium.remote.CommandInfo; import org.openqa.selenium.remote.http.HttpMethod; import org.openqa.selenium.remote.service.DriverCommandExecutor; import org.openqa.selenium.remote.service.DriverService; /** * {@link DriverCommandExecutor} that understands ChromiumDriver specific commands. * * @see <a href="https://chromium.googlesource.com/chromium/src/+/master/chrome/test/chromedriver/client/command_executor.py">List of ChromeWebdriver commands</a> */ public class ChromiumDriverCommandExecutor extends DriverCommandExecutor { private static Map<String, CommandInfo> buildChromiumCommandMappings(String vendorKeyword) { String sessionPrefix = "/session/:sessionId/"; String chromiumPrefix = sessionPrefix + "chromium"; String vendorPrefix = sessionPrefix + vendorKeyword; HashMap<String, CommandInfo> mappings = new HashMap<>(); mappings.put(ChromiumDriverCommand.LAUNCH_APP, new CommandInfo(chromiumPrefix + "/launch_app", HttpMethod.POST)); String networkConditions = chromiumPrefix + "/network_conditions"; mappings.put(ChromiumDriverCommand.GET_NETWORK_CONDITIONS, new CommandInfo(networkConditions, HttpMethod.GET)); mappings.put(ChromiumDriverCommand.SET_NETWORK_CONDITIONS, new CommandInfo(networkConditions, HttpMethod.POST)); mappings.put(ChromiumDriverCommand.DELETE_NETWORK_CONDITIONS, new CommandInfo(networkConditions, HttpMethod.DELETE)); mappings.put( ChromiumDriverCommand.EXECUTE_CDP_COMMAND, new CommandInfo(vendorPrefix + "/cdp/execute", HttpMethod.POST)); // Cast / Media Router APIs String cast = vendorPrefix + "/cast"; mappings.put(ChromiumDriverCommand.GET_CAST_SINKS, new CommandInfo(cast + "/get_sinks", HttpMethod.GET)); mappings.put(ChromiumDriverCommand.SET_CAST_SINK_TO_USE, new CommandInfo(cast + "/set_sink_to_use", HttpMethod.POST)); mappings.put(ChromiumDriverCommand.START_CAST_TAB_MIRRORING, new CommandInfo(cast + "/start_tab_mirroring", HttpMethod.POST)); mappings.put(ChromiumDriverCommand.GET_CAST_ISSUE_MESSAGE, new CommandInfo(cast + "/get_issue_message", HttpMethod.GET)); mappings.put(ChromiumDriverCommand.STOP_CASTING, new CommandInfo(cast + "/stop_casting", HttpMethod.POST)); mappings.put(ChromiumDriverCommand.SET_PERMISSION, new CommandInfo(sessionPrefix + "/permissions", HttpMethod.POST)); return unmodifiableMap(mappings); } public ChromiumDriverCommandExecutor(String vendorPrefix, DriverService service) { super(service, buildChromiumCommandMappings(vendorPrefix)); } } package com.xxx.selenium; import java.text.SimpleDateFormat; import java.util.Date; import java.util.HashMap; import java.util.Map; import java.util.Random; import org.openqa.selenium.Proxy; import org.openqa.selenium.WebDriver; import org.openqa.selenium.chrome.ChromeDriver; import org.openqa.selenium.chrome.ChromeOptions; import org.openqa.selenium.remote.DesiredCapabilities; public class DriverUtil { /** * 獲取可以執行cdp命令的ChromiumDriver,可以繞過 webdriver檢測 * 1.https://intoli.com/blog/not-possible-to-block-chrome-headless/ * 2.https://intoli.com/blog/making-chrome-headless-undetectable/ * 3.https://github.com/chromium/chromium/blob/d7da0240cae77824d1eda25745c4022757499131/third_party/blink/renderer/core/frame/navigator.h * @param request * @return */ public ChromiumDriver getChromiumDriver() { // 設置谷歌瀏覽器驅動,我放在項目的路徑下,這個驅動可以幫你打開本地的谷歌瀏覽器 String driverFilePath = "谷歌瀏覽器驅動地址"; if (!StringUtils.isEmpty(driverFilePath)){ System.setProperty("webdriver.chrome.driver", driverFilePath); } // 設置對谷歌瀏覽器的初始配置 開始 HashMap<String, Object> prefs = new HashMap<String, Object>(); ChromeOptions options = new ChromeOptions(); options.setExperimentalOption("prefs", prefs); String[] a = { "enable-automation" }; options.setExperimentalOption("excludeSwitches", a); options.addArguments("--headless"); options.addArguments("window-size=1920,1080"); String ua="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36"; options.addArguments(String.format("--user-agent=%s", ua)); DesiredCapabilities chromeCaps = DesiredCapabilities.chrome(); chromeCaps.setCapability(ChromeOptions.CAPABILITY, options); //執行cdp命令,修改webdriver的值為undefined ChromiumDriver driver = new ChromiumDriver(chromeCaps); HashMap<String, Object> cdpCmd = new HashMap<String, Object>(); cdpCmd.put("source", "Object.defineProperty(navigator, 'webdriver', {get: () => undefined }); "); driver.executeCdpCommand("Page.addScriptToEvaluateOnNewDocument", cdpCmd); return driver; }