基於bs4庫的HTML標簽遍歷方法

import requests
r=requests.get('http://python123.io/ws/demo.html')
demo=r.text

HTML基本格式

HTML可以看做一棵標簽樹

遍歷方法

下行遍歷

屬性	說明
.contents	將該標簽所有的兒子節點存入列表
.children	子節點的迭代類型，和contents類似，用於遍歷兒子節點
.descendants	子孫節點的迭代類型，包含所有的子孫跌點，用於循環遍歷

import requests
from bs4 import BeautifulSoup

r=requests.get('http://python123.io/ws/demo.html')
demo=r.text
soup=BeautifulSoup(demo,'html.parser')
print(soup.contents)# 獲取整個標簽樹的兒子節點
print(soup.body.content)#返回標簽樹的body標簽下的節點
print(soup.head)#返回head標簽
print(len(soup.body.content))#輸出body標簽兒子節點的個數
print(soup.body.content[1])#獲取body下第一個子標簽

遍歷子孫節點

import requests
from bs4 import BeautifulSoup

r=requests.get('http://python123.io/ws/demo.html')
demo=r.text
soup=BeautifulSoup(demo,'html.parser')

for child in soup.body.children:#遍歷兒子節點
    print(child)
    

for child in soup.body.descendants:#遍歷子孫節點
    print(child)

上行遍歷

屬性	說明
.parent	節點的父親標簽
.parents	節點的先輩標簽的迭代類型，用於循環遍歷先輩節點

import requests
from bs4 import BeautifulSoup

r=requests.get('http://python123.io/ws/demo.html')
demo=r.text
soup=BeautifulSoup(demo,'html.parser')
print(soup.title.parent)
print(soup.title.parent)
print(soup.parent)

import requests
from bs4 import BeautifulSoup

r=requests.get('http://python123.io/ws/demo.html')
demo=r.text
soup=BeautifulSoup(demo,'html.parser')

for parent in soup.a.parents:#遍歷先輩的信息
    if parent is None:
        print(parent)
    else:
        print(parent.name)

平行遍歷

屬性	說明
.next_sibling	返回HTML文本順序的下一個平行標簽
.previous_sibling	返回HTML文本順序的上一個平行標簽
.next_siblings	迭代類型，返回HTML文本順序后續所有的平行標簽
.pervious_siblings	迭代類型，返回HTML文本順序前面所有的平行標簽

注意

標簽樹的平行遍歷是有條件的
平行遍歷發生在同一個父親節點的各節點之間
標簽中的內容也構成了節點

import requests
from bs4 import BeautifulSoup

r=requests.get('http://python123.io/ws/demo.html')
demo=r.text
soup=BeautifulSoup(demo,'html.parser')

print(soup.a.next_sibling)#a標簽的下一個標簽
print(soup.a.next_sibling.next_sibling)#a標簽的下一個標簽的下一個標簽
print(soup.a.previous_sibling)#a標簽的前一個標簽
print(soup.a.previous_sibling.previous_sibling)#a標簽的前一個標簽的前一個標簽

平行遍歷

import requests
from bs4 import BeautifulSoup

r=requests.get('http://python123.io/ws/demo.html')
demo=r.text
soup=BeautifulSoup(demo,'html.parser')


for sibling in soup.a.next_siblings:#遍歷后續節點
    print(sibling)
    
    
for sibling in soup.a.previous_sibling:#遍歷之前的節點
    print(sibling)

有層次感的輸出-prettify()

import requests
from bs4 import BeautifulSoup

r=requests.get('http://python123.io/ws/demo.html')
demo=r.text
soup=BeautifulSoup(demo,'html.parser')
print(soup.prettify())

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 基於bs4庫的HTML內容查找方法 Python爬蟲——利用bs4庫對HTML頁面信息進行遍歷讀取 bs4解析庫 BS4庫詳解 bs4 python解析html Python之解BS4庫如何安裝與使用？正確方法教你 python關於bs4庫的整理【Python 庫】bs4的使用 bs4修改html文件和保存 Python BS4庫的安裝與使用詳解