Python爬蟲_BeautifulSoup 定位取值

本文轉載自查看原文 2018-12-01 19:35 3107 Python 爬蟲

-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-

從網頁中獲取指定標簽、屬性值，取值方式：

　　1.通過標簽名獲取：tag.name tag對應的type是<class 'bs4.element.Tag'>

　　2.通過屬性獲取：tag.attrs

　　3.獲取標簽屬性：tag.get('屬性名') 或 tag['屬性名']

獲取標簽內容：

　　1.tag.string 獲取當前標簽的內容，只有一個標簽的時候，（是能處理一個標簽，返回標簽的text內容）

　　2.tag.get_text() 獲取標簽內所有的字符串

BeautifulSoup 功能標簽

　　1. stripped_strings

　　 輸出的字符串中可能包含了很多空格或空行,使用 .stripped_strings 可以去除多余空白內容
for string in soup.stripped_strings:
    print(repr(string))
    # u"The Dormouse's story"
    # u"The Dormouse's story"
    # u'Once upon a time there were three little sisters; and their names were'
    # u'Elsie'
    # u','
    # u'Lacie'
    # u'and'
    # u'Tillie'
    # u';\nand they lived at the bottom of a well.'
　　2. 標准輸出頁面：

　　　　soup.prettify()

BeautifulSoup 查找元素：

　　1.find_all(class_="class") 返回的是多個標簽，格式為<class 'bs4.element.ResultSet'>

　　2.find(class_="class") 返回一個標簽，格式是<class 'bs4.element.Tag'>

　　3.select_one()    返回一個標簽，格式是<class 'bs4.element.Tag'>

　　4.select()    返回的是多個標簽，格式為<class 'bs4.element.ResultSet'>

　　5.　soup = BeautifulSoup(backdata,'html.parser')　　#轉換為BeautifulSoup形式屬性

　　　　soup.find_all('標簽名'，attrs{'屬性名':'屬性值'} ) #返回的是列表

　　　　limitk 控制 find_allf返回的數量

　　　　recursive=Flasef返回tag的直接子元素

　　　　soup.find_all(text=re.compile(' content '))     根據文本匹配，可模糊匹配

子節點處理方式：

　　1. contents

　　　　.contents 屬性可以將tag的子節點以列表的方式輸出

　　2. children

　　　　.children 生成器,可以對tag的子節點進行循環

　　3. descendants

　　　　contents和children 只是返回的是直接子節點，而descendants返回的是對多有的子孫節點進行循環

父節點處理方式：

　　1. parent

　　　　通過 .parent 屬性來獲取某個元素的父節點

　　2. find_parents（）

　　　　返回祖先節點

　　2. find_parent（）

　　　　返回父節點

兄弟節點處理方式：

　　1. next_siblings 下一個兄弟節點

　　2. previous_siblings 上一個兄弟節點

　　3. find_next_siblings（）下一個兄弟節點

　　4. find_next_sibling（）上一個兄弟節點

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 python爬蟲beautifulsoup查找定位Select用法 python爬蟲之request and BeautifulSoup python爬蟲---BeautifulSoup的用法 Python 爬蟲—— requests BeautifulSoup Python爬蟲之BeautifulSoup和requests python爬蟲（beautifulsoup） python爬蟲之beautifulsoup的使用 Python網絡爬蟲之BeautifulSoup模塊 python爬蟲beautifulsoup4系列1 python爬蟲beautifulsoup4系列2