Python爬蟲小白入門(十二)Python 爬蟲 – 根據id與class查找標簽


本章介紹怎么根據id與class查找標簽。假設有下面的HTML文檔:

<html>
<head>
<title>A simple example page</title>
</head>
<body>
<div>
<p class="inner-text first-item" id="first">
First paragraph.
</p>
<p class="inner-text">
Second paragraph.
</p>
</div>
<p class="outer-text first-item" id="second">
<b>
First outer paragraph.
</b>
</p>
<p class="outer-text">
<b>
Second outer paragraph.
</b>
</p>
</body>
</html>

可以通過URL https://kevinhwu.github.io/demo/python-scraping/simple2.html 訪問上面的文檔。讓我們先下載頁面並創建一個BeautifulSoup對象:

import requests
from bs4 import BeautifulSoup

page = requests.get("https://kevinhwu.github.io/demo/python-scraping/simple2.html")
soup = BeautifulSoup(page.content, 'html.parser')

根據class查找標簽

根據id與class查找標簽,使用的仍舊是find_all方法。下面的例子,查找類是outer-textp標簽:

soup.find_all('p', class_='outer-text')

輸出

[<p class="outer-text first-item" id="second">
<b>
First outer paragraph.
</b>
</p>, <p class="outer-text">
<b>
Second outer paragraph.
</b>
</p>]

在下面的例子中,查找任何類是outer-text的標簽:

soup.find_all(class_="outer-text")

輸出

[<p class="outer-text first-item" id="second">
<b>
First outer paragraph.
</b>
</p>, <p class="outer-text">
<b>
Second outer paragraph.
</b>
</p>]

根據id查找標簽

另外,也可以通過id查找標簽:

[<p class="inner-text first-item" id="first">
First paragraph.
</p>]

輸出

[<p class="inner-text first-item" id="first">
First paragraph.
</p>]

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM