selenium 如何抓取請求信息


頁面性能監控

很多公司都會做頁面性能的檢測,做的方法也非常多。其實比較簡單的是利用js去做,js可以很方便的調取瀏覽器的api,獲取network里的相關信息,這個資料還是比較多的。唯一的難點在於如何注入js腳本,業內的方法的普遍有兩種

  • 利用selenium的execute_script,注入js腳本
  • 在nginx層加入subfilter配置,注入js腳本

這個具體的實現,可以去調研,暫時沒有實踐過。

不注入js腳本的前提下,如何獲取頁面數據

組內對於以上兩種方法,認為注入js對代碼的侵入性太大,不是很客觀,所以一直在搜索其他的實現方法,目前能想到的也就只有通過selenium和chromedriver,看能否從里面提取一些頁面請求信息。本文只是記錄一下調研的過程。
在stackoverflow上搜到文章 Browser performance tests through selenium,里面給了一個例子,是可以獲取到相關日志,日志的量非常龐大

import json from selenium import webdriver from selenium.webdriver.common.desired_capabilities import DesiredCapabilities #caps = DesiredCapabilities.CHROME #caps['loggingPrefs'] = {'performance': 'ALL'} # 78版本的chrome需要加這個,https://stackoverflow.com/questions/56812190/protractor-log-type-performance-not-found-error caps = { 'browserName': 'chrome', 'loggingPrefs': { 'browser': 'ALL', 'driver': 'ALL', 'performance': 'ALL', }, 'goog:chromeOptions': { 'perfLoggingPrefs': { 'enableNetwork': True, }, 'w3c': False, }, } driver = webdriver.Chrome(desired_capabilities=caps) driver = webdriver.Chrome(desired_capabilities=caps) driver.get('https://www.baidu.com') logs = [json.loads(log['message'])['message'] for log in driver.get_log('performance')] with open('devtools.json', 'wb') as f: json.dump(logs, f) driver.close() 

試了一下,可以拉起pc端的chrome,但我需要在安卓機上跑,所以需要更改下chrome的配置,代碼如下
https://sites.google.com/a/chromium.org/chromedriver/capabilities
https://github.com/webdriverio/webdriverio/issues/476

from selenium import webdriver from pprint import pprint from selenium.webdriver.common.desired_capabilities import DesiredCapabilities caps = DesiredCapabilities.CHROME caps['loggingPrefs'] = {'performance': 'ALL'} options = webdriver.ChromeOptions() options.add_experimental_option('androidPackage', 'com.android.chrome') options.add_experimental_option('perfLoggingPrefs',{'enableNetwork': True}) driver = webdriver.Chrome(chrome_options=options,desired_capabilities=caps) driver.get('https://www.baidu.com') log=driver.get_log('performance') pprint(log) 

拿到log后,接下來就可以分析這些日志了,日志非常多,格式化后有三千多行。
首先最外層是個大數組,包含若干字典格式的數據,如下:

[
{'level': 'INFO',
 'message': '{"message":...}',
 'timestamp': 1515665118083},
{'level': 'INFO',
 'message': '{"message":...}',
 'timestamp': 1515665118083}
 ]

在這一層沒提取到什么信息,最主要的信息都是在message字段里,是一個非常長的json字符串,message字段里又包含message和webview字段,格式化后如下:

{'message': {'method': 'Network.requestWillBeSent',
             'params': {'documentURL': 'https://www.baidu.com/',
                        'frameId': '9616.1',
                        'initiator': {'type': 'other'},
                        'loaderId': '9616.4',
                        'request': {'headers': {'Upgrade-Insecure-Requests': '1',
                                                'User-Agent': 'Mozilla/5.0 '
                                                              '(Linux; Android '
                                                              '4.4.4; SM-A7000 '
                                                              'Build/KTU84P) '
                                                              'AppleWebKit/537.36 '
                                                              '(KHTML, like '
                                                              'Gecko) '
                                                              'Chrome/58.0.3029.83 '
                                                              'Mobile '
                                                              'Safari/537.36'},
                                    'initialPriority': 'VeryHigh',
                                    'method': 'GET',
                                    'mixedContentType': 'none',
                                    'referrerPolicy': 'no-referrer-when-downgrade',
                                    'url': 'https://www.baidu.com/'},
                        'requestId': '9616.88',
                        'timestamp': 6117.672547,
                        'type': 'Document',
                        'wallTime': 1515665221.86887}},
 'webview': '0'}

我們再拆解message['message']字段,里面一般會包含method和params兩項。methods主要有以下這些:

Network.requestWillBeSent
Network.responseReceived
Network.dataReceived # 觸發多次
Network.loadingFinished

各個階段的返回信息如下:

Page.frameStartedLoading
Network.requestWillBeSent
Network.responseReceived
Network.dataReceived
Page.frameNavigated
Network.requestServedFromCache
Network.loadingFinished
Network.resourceChangedPriority
Page.domContentEventFired
Network.loadingFailed
Page.loadEventFired
Page.frameStoppedLoading

其中一個正常的請求大致需要經歷以下過程:

{'method': 'Network.requestWillBeSent', 'params': {'documentURL': 'https://www.baidu.com/', 'frameId': '9616.1', 'initiator': {'type': 'other'}, 'loaderId': '9616.4', 'request': {'headers': {'Upgrade-Insecure-Requests': '1', 'User-Agent': 'Mozilla/5.0 (Linux; Android ' '4.4.4; SM-A7000 ' 'Build/KTU84P) ' 'AppleWebKit/537.36 (KHTML, ' 'like Gecko) ' 'Chrome/58.0.3029.83 Mobile ' 'Safari/537.36'}, 'initialPriority': 'VeryHigh', 'method': 'GET', 'mixedContentType': 'none', 'referrerPolicy': 'no-referrer-when-downgrade', 'url': 'https://www.baidu.com/'}, 'requestId': '9616.88', 'timestamp': 6117.672547, 'type': 'Document', 'wallTime': 1515665221.86887}} {'method': 'Network.responseReceived', 'params': {'frameId': '9616.1', 'loaderId': '9616.4', 'requestId': '9616.88', 'response': {'connectionId': 707, 'connectionReused': False, 'encodedDataLength': 875, 'fromDiskCache': False, 'fromServiceWorker': False, 'headers': {'Cache-Control': 'no-cache', 'Connection': 'Keep-Alive', 'Content-Encoding': 'gzip', 'Content-Type': 'text/html;charset=utf-8', 'Coremonitorno': '0', 'Date': 'Thu, 11 Jan 2018 10:07:03 GMT', 'Server': 'apache', 'Set-Cookie': 'H_WISE_SIDS=102206_116791_102629_121253_120157_118886_118865_118852_118825_118802_107316_121254_121535_121214_117333_117431_121666_120590_121563_121044_121421_120944_121042_121363_121153_114819_120550_120852_119324_121325_116408_110085_118758; ' 'path=/; domain=.baidu.com\n' 'bd_traffictrace=111752_111807\n' 'BDSVRTM=278; path=/\n' 'eqid=deleted; path=/; ' 'domain=.baidu.com; ' 'expires=Thu, 01 Jan 1970 ' '00:00:00 GMT', 'Strict-Transport-Security': 'max-age=172800', 'Tracecode': '04233185700634468874011118\n' '04230518060407801866011118', 'Traceid': '151566522306294144108392574322183443433', 'Transfer-Encoding': 'chunked', 'Vary': 'Accept-Encoding'}, 'headersText': 'HTTP/1.1 200 OK\r\n' 'Cache-Control: no-cache\r\n' 'Connection: Keep-Alive\r\n' 'Content-Encoding: gzip\r\n' 'Content-Type: ' 'text/html;charset=utf-8\r\n' 'Coremonitorno: 0\r\n' 'Date: Thu, 11 Jan 2018 10:07:03 ' 'GMT\r\n' 'Server: apache\r\n' 'Set-Cookie: ' 'H_WISE_SIDS=102206_116791_102629_121253_120157_118886_118865_118852_118825_118802_107316_121254_121535_121214_117333_117431_121666_120590_121563_121044_121421_120944_121042_121363_121153_114819_120550_120852_119324_121325_116408_110085_118758; ' 'path=/; domain=.baidu.com\r\n' 'Set-Cookie: ' 'bd_traffictrace=111752_111807\r\n' 'Set-Cookie: BDSVRTM=278; path=/\r\n' 'Set-Cookie: eqid=deleted; path=/; ' 'domain=.baidu.com; expires=Thu, 01 ' 'Jan 1970 00:00:00 GMT\r\n' 'Strict-Transport-Security: ' 'max-age=172800\r\n' 'Tracecode: ' '04233185700634468874011118\r\n' 'Tracecode: ' '04230518060407801866011118\r\n' 'Traceid: ' '151566522306294144108392574322183443433\r\n' 'Vary: Accept-Encoding\r\n' 'Transfer-Encoding: chunked\r\n' '\r\n', 'mimeType': 'text/html', 'protocol': 'http/1.1', 'remoteIPAddress': '61.135.169.125', 'remotePort': 443, 'requestHeaders': {'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8', 'Accept-Encoding': 'gzip, deflate, ' 'sdch, br', 'Accept-Language': 'zh-CN,zh;q=0.8', 'Cache-Control': 'max-age=0', 'Connection': 'keep-alive', 'Cookie': 'BAIDUID=21B2249D2761E4A9CE64A356CFA0A21D:FG=1; ' 'bd_traffictrace=111752; ' 'BDSVRTM=22; ' 'plus_lsv=bddd969db4a4e207; ' 'BDORZ=AE84CDB3A529C0F8A2B9DCDD1D18B695; ' 'plus_cv=1::m:21732389; ' 'Hm_lvt_12423ecbc0e2ca965d84259063d35238=1515664352; ' 'Hm_lpvt_12423ecbc0e2ca965d84259063d35238=1515664352; ' 'H_WISE_SIDS=102206_116791_102629_121253_120157_118886_118865_118852_118825_118802_107316_121254_121535_121214_117333_117431_121666_120590_121563_121044_121421_120944_121042_121363_121153_114819_120550_120852_119324_121325_116408_110085_118758', 'Host': 'www.baidu.com', 'Upgrade-Insecure-Requests': '1', 'User-Agent': 'Mozilla/5.0 (Linux; ' 'Android 4.4.4; ' 'SM-A7000 ' 'Build/KTU84P) ' 'AppleWebKit/537.36 ' '(KHTML, like Gecko) ' 'Chrome/58.0.3029.83 ' 'Mobile ' 'Safari/537.36'}, 'securityState': 'secure', 'status': 200, 'statusText': 'OK', 'timing': {'connectEnd': 352.51100000005, 'connectStart': 0, 'dnsEnd': 0, 'dnsStart': 0, 'proxyEnd': -1, 'proxyStart': -1, 'pushEnd': 0, 'pushStart': 0, 'receiveHeadersEnd': 1468.39399999953, 'requestTime': 6117.724403, 'sendEnd': 1062.45399999989, 'sendStart': 1061.98099999983, 'sslEnd': 352.458999999726, 'sslStart': 107.189999999719, 'workerReady': -1, 'workerStart': -1}, 'url': 'https://www.baidu.com/'}, 'timestamp': 6119.205715, 'type': 'Document'}} {'method': 'Network.dataReceived', 'params': {'dataLength': 7150, 'encodedDataLength': 0, 'requestId': '9616.88', 'timestamp': 6119.209894}} {'method': 'Network.dataReceived', 'params': {'dataLength': 32768, 'encodedDataLength': 3221, 'requestId': '9616.88', 'timestamp': 6119.256851}} {'method': 'Network.loadingFinished', 'params': {'encodedDataLength': 42585, 'requestId': '9616.88', 'timestamp': 6119.430418}} 

其中我們所需要的信息都在Network.responseReceived里面,有url、timing、remoteIPAddress、status狀態碼。其中這個encodedDataLength在response里面僅有875,感覺不太對,這個如何獲取請求的大小比較復雜,不同的請求,獲取方法不太一樣。這個還沒研究出如何獲取。

如何獲取網頁使用的內存信息

參考文獻:

https://sites.google.com/a/chromium.org/chromedriver/logging/performance-log
https://chromedevtools.github.io/devtools-protocol/
https://stackoverflow.com/questions/27596229/browser-performance-tests-through-selenium/
https://github.com/webdriverio/webdriverio/issues/476
https://stackoverflow.com/questions/27657655/unable-to-get-performance-logs
https://groups.google.com/forum/#!topic/google-chrome-developer-tools/FCCV2J7BaIY
https://chromedevtools.github.io/devtools-protocol/tot/Network/



作者:leyu
鏈接:https://www.jianshu.com/p/615e3c0140a5
來源:簡書
著作權歸作者所有。商業轉載請聯系作者獲得授權,非商業轉載請注明出處。


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM