使用selenium和chromedriver組合爬蟲時，如果爬取的頁面數量巨多，會出現占用內存逐漸增大知道程序崩潰的情況

本文轉載自查看原文 2019-08-14 17:17 1118 爬蟲/ python

使用selenium和chromedriver組合爬蟲時，如果爬取的頁面數量巨多，會出現占用內存逐漸增大知道程序崩潰的情況。

解決方案：關閉當前的窗口（注意，phantomjs中的窗口其實就是chrome里的標簽頁，phantomjs是無界面瀏覽器，不需要像chrome那樣可以把幾個標簽頁放在不同的“窗口”顯示），打開一個新的窗口請求頁面

代碼如下

from bs4 import BeautifulSoup
import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.add_argument('--headless') #關閉圖形界面，提高效率
#打開一個瀏覽器
browser = webdriver.Chrome(executable_path=r'C:\ProgramData\Anaconda3\chromedriver.exe',chrome_options=chrome_options)

for i in range(1000): #為了查看內存而設置的，多次循環，容易觀察
    time.sleep(2)
    browser.get('https://www.baidu.com/')
    html = browser.page_source
    soup = BeautifulSoup(html, 'html.parser')

    browser.execute_script('window.open("https://www.sogou.com");')
    print(browser.window_handles)
    browser.close() # 關閉到當前窗口
    print(browser.window_handles) # 跳轉到下一個窗口
    for handle in browser.window_handles:
        browser.switch_to.window(handle)


    print(soup.prettify())
    print("*******************************************************************************************\n\n")


browser.quit()

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 node 使用selenium 爬取頁面數據（node爬蟲）【Python爬蟲】之爬取頁面內容、圖片以及用selenium爬取爬蟲之Selenium 動態渲染頁面爬取 selenium異步爬取（selenium+Chromedriver） Python爬蟲初探 - selenium+beautifulsoup4+chromedriver爬取需要登錄的網頁信息爬蟲再探實戰（三）———爬取動態加載頁面——selenium 爬蟲是什么嗎？你知道爬蟲的爬取流程嗎？ python 關閉chromedriver 的正確方法 selenium使用webdriver爬取ip地址源碼 Python爬蟲學習第一天--利用selenium和chromedriver驅動瀏覽器爬取網頁還是爬蟲，使用的是selenium，爬取的是智聯，爬取速度灰常慢...