selenium 教程

本文轉載自查看原文 2019-05-23 15:20 4269 爬蟲

selenium 本身是一套web自動化測試工具，但其經常被用於爬蟲，解決一些復雜爬蟲的問題。

selenium 用於爬蟲時，相當於模擬人操作瀏覽器。

瀏覽器驅動

使用 selenium 需要先安裝瀏覽器驅動，selenium 支持多種瀏覽器

可以看到支持的瀏覽器類型有十幾種，其中常用的有

chrome　谷歌，驅動下載地址，注意瀏覽器與驅動的版本要匹配，下面的瀏覽器也一樣

firefox，火狐，驅動下載地址

ie，ie不好用，驅動下載地址

phantomjs，這是一個無界面的瀏覽器，特點是高效，后面我會有一篇博客專門介紹它。

safari，手機瀏覽器

驅動要放到環境變量的地址里，如 c://python2，或者把驅動的地址放到環境變量里

具體安裝請百度，搜索 “selenium 瀏覽器驅動下載” 即可

注意，linux 中瀏覽器驅動要安裝對應的 linux 版本

基礎使用方法

1. 聲明瀏覽對象

from selenium import webdriver

#構造模擬瀏覽器
# firefox_login=webdriver.Ie()   # Firefox()
firefox_login=webdriver.Chrome()

這一步可設定無界面模式，即操作瀏覽器時，隱層瀏覽器

options = webdriver.ChromeOptions()
options.add_argument('--headless')      # 設置無界面  可選

firefox_login=webdriver.Chrome(chrome_options=options)

2. 訪問頁面

firefox_login.get('http://www.renren.com/')
# firefox_login.maximize_window()　　# 窗口最大化，可有可無，看情況
firefox_login.minimize_window()

3. 查找元素並交互

firefox_login.find_element_by_id('email').clear()
firefox_login.find_element_by_id('email').send_keys('xxx@sina.com')

元素查找方法匯總

find_element_by_name
find_element_by_id
find_element_by_xpath
find_element_by_link_text
find_element_by_partial_link_text
find_element_by_tag_name
find_element_by_class_name
find_element_by_css_selector

以上是單元素查找，多元素把 element 變成 elements 即可。

還有一種較通用的方法

from selenium.webdriver.common.by import By    注意這里要導入

browser = webdriver.Chrome()
browser.get("http://www.taobao.com")

input_first = browser.find_element(By.ID,"q")    ID可以換成其他

4. 操作瀏覽器

firefox_login.find_element_by_id('login').click()

可將操作放入動作鏈中串行執行

from selenium import webdriver
from selenium.webdriver import ActionChains

browser = webdriver.Chrome()
url = "http://www.runoob.com/try/try.php?filename=jqueryui-api-droppable"
browser.get(url)
# 
browser.switch_to.frame('iframeResult')
source = browser.find_element_by_css_selector('#draggable')
target = browser.find_element_by_css_selector('#droppable')
actions = ActionChains(browser)
actions.drag_and_drop(source, target)
actions.perform()

上面實現了一個元素拖拽的功能

執行 js 命令

直接用js命令操作瀏覽器

from selenium import webdriver
browser = webdriver.Chrome()
browser.get("http://www.zhihu.com/explore")
browser.execute_script('window.scrollTo(0, document.body.scrollHeight)')
browser.execute_script('alert("To Bottom")')

5. 輸出並關閉

print(firefox_login.current_url)
print(firefox_login.page_source)

#瀏覽器退出
# firefox_login.close()
firefox_login.quit()

獲取元素屬性

get_attribute('class')

logo = browser.find_element_by_id('zh-top-link-logo')
print(logo.get_attribute('class'))

獲取文本 logo.text

獲取id logo.id

獲取位置 logo.location

獲取標簽名logo.tag_name

獲取size logo.size

方法進階

除了基礎的操作外，還有很多特殊的應用場景需要處理。

frame 標簽

很多網頁中存在 frame 標簽，要處理frame里面的數據，首先要切入frame，處理完了還要切出來。

切入用 switch_to.frame，切出用 switch_to.parent_frame

示例

# encoding:utf-8

import time
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException

browser = webdriver.Chrome()
url = 'http://www.runoob.com/try/try.php?filename=jqueryui-api-droppable'
browser.get(url)
browser.switch_to.frame('iframeResult')     # iframeResult 是 iframe 的 id       進入frame
source = browser.find_element_by_css_selector('#draggable')
print(source)
try:
    logo = browser.find_element_by_class_name('logo')
except NoSuchElementException:
    print('NO LOGO')
browser.switch_to.parent_frame()        # 退出 frame
logo = browser.find_element_by_class_name('logo')
print(logo)
print(logo.text)

上面url的部分源碼

等待

在操作瀏覽器時經常要等待，selenium 也有等待方法，分為顯式等待和隱式等待

隱式等待

from selenium import webdriver

browser = webdriver.Chrome()
browser.implicitly_wait(100)　　　　# 
browser.get('https://www.zhihu.com/explore')
input = browser.find_element_by_class_name('zu-top-add-question')
print(input)

顯式等待

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

browser = webdriver.Chrome()
browser.get('https://www.taobao.com/')
wait = WebDriverWait(browser, 100)　　　　# 
input = wait.until(EC.presence_of_element_located((By.ID, 'q')))
button = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, '.btn-search')))
print(input, button)

顯式等待和隱式等待都是無阻塞的，即響應就繼續，不同的是，顯示等待需要設定響應條件，如獲取某元素。

常用判斷條件

title_is：判斷當前頁面的title是否等於預期
title_contains：判斷當前頁面的title是否包含預期字符串
presence_of_element_located：判斷某個元素是否被加到了dom樹里，並不代表該元素一定可見
visibility_of_element_located：判斷某個元素是否可見. 可見代表元素非隱藏，並且元素的寬和高都不等於0
visibility_of：跟上面的方法做一樣的事情，只是上面的方法要傳入locator，這個方法直接傳定位到的element就好了
presence_of_all_elements_located：判斷是否至少有1個元素存在於dom樹中。舉個例子，如果頁面上有n個元素的class都是'column-md-3'，那么只要有1個元素存在，這個方法就返回True
text_to_be_present_in_element：判斷某個元素中的text是否 包含 了預期的字符串
text_to_be_present_in_element_value：判斷某個元素中的value屬性是否包含了預期的字符串
frame_to_be_available_and_switch_to_it：判斷該frame是否可以switch進去，如果可以的話，返回True並且switch進去，否則返回False
invisibility_of_element_located：判斷某個元素中是否不存在於dom樹或不可見
element_to_be_clickable - it is Displayed and Enabled：判斷某個元素中是否可見並且是enable的，這樣的話才叫clickable
staleness_of：等某個元素從dom樹中移除，注意，這個方法也是返回True或False
element_to_be_selected：判斷某個元素是否被選中了,一般用在下拉列表
element_located_to_be_selected
element_selection_state_to_be：判斷某個元素的選中狀態是否符合預期
element_located_selection_state_to_be：跟上面的方法作用一樣，只是上面的方法傳入定位到的element，而這個方法傳入locator
alert_is_present：判斷頁面上是否存在alert

wait.until(EC.text_to_be_present_in_element_value(('id', 'inputSearchCity'), u'西安'))

瀏覽器的前進后退

forward/back

import time
from selenium import webdriver

browser = webdriver.Chrome()
browser.get('https://www.baidu.com/')
browser.get('https://www.taobao.com/')
browser.back()
time.sleep(1)
browser.forward()
browser.close()

cookie 操作

get_cookies()

delete_all_cookies()

add_cookie()

from selenium import webdriver

browser = webdriver.Chrome()
browser.get('https://www.zhihu.com/explore')
print(browser.get_cookies())
browser.add_cookie({'name': 'name', 'domain': 'www.zhihu.com', 'value': 'zhaofan'})
print(browser.get_cookies())
browser.delete_all_cookies()
print(browser.get_cookies())

選項卡管理

暫略

異常處理

暫略

參考資料：

https://selenium-python.readthedocs.io/　　英文官方教程

https://selenium-python.readthedocs.io/api.html　　webdriver API

《Python爬蟲開發與項目實戰》　　　　pdf電子書

http://www.cnblogs.com/zhaof/p/6953241.html　　　　很好的教程

https://www.jianshu.com/p/47853fdb613b　　等待

https://blog.csdn.net/qq_38316655/article/details/81989232 等待實例

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Selenium基本教程 python selenium教程爬蟲selenium教程安裝selenium教程 Selenium詳細教程 Selenium IDE 基礎使用教程 javascript selenium全套教程發布 selenium-webdriver 簡單教程 selenium(8):Selenium獲取session和token供Requests使用教程 Selenium簡介與使用教程&項目實戰