Python爬蟲小白入門（十二）Python 爬蟲 – 根據id與class查找標簽

本文轉載自查看原文 2020-06-22 11:08 2740 python之爬蟲

本章介紹怎么根據id與class查找標簽。假設有下面的HTML文檔：

<html>
<head>
<title>A simple example page</title>
</head>
<body>
<div>
<p class="inner-text first-item" id="first">
First paragraph.
</p>
<p class="inner-text">
Second paragraph.
</p>
</div>
<p class="outer-text first-item" id="second">
<b>
First outer paragraph.
</b>
</p>
<p class="outer-text">
<b>
Second outer paragraph.
</b>
</p>
</body>
</html>

可以通過URL https://kevinhwu.github.io/demo/python-scraping/simple2.html 訪問上面的文檔。讓我們先下載頁面並創建一個BeautifulSoup對象:

import requests
from bs4 import BeautifulSoup

page = requests.get("https://kevinhwu.github.io/demo/python-scraping/simple2.html")
soup = BeautifulSoup(page.content, 'html.parser')

根據class查找標簽

根據id與class查找標簽，使用的仍舊是find_all方法。下面的例子，查找類是outer-text的p標簽：

soup.find_all('p', class_='outer-text')

輸出

[<p class="outer-text first-item" id="second">
<b>
First outer paragraph.
</b>
</p>, <p class="outer-text">
<b>
Second outer paragraph.
</b>
</p>]

在下面的例子中，查找任何類是outer-text的標簽:

soup.find_all(class_="outer-text")

輸出

[<p class="outer-text first-item" id="second">
<b>
First outer paragraph.
</b>
</p>, <p class="outer-text">
<b>
Second outer paragraph.
</b>
</p>]

根據id查找標簽

另外，也可以通過id查找標簽:

[<p class="inner-text first-item" id="first">
First paragraph.
</p>]

輸出

[<p class="inner-text first-item" id="first">
First paragraph.
</p>]

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Python爬蟲小白入門（二）requests庫 Python 爬蟲從入門到進階之路（十二） Python爬蟲小白入門（三）BeautifulSoup庫小白學 Python 爬蟲（34）：爬蟲框架 Scrapy 入門基礎（二）小白學 Python 爬蟲（33）：爬蟲框架 Scrapy 入門基礎（一） Python爬蟲從入門到放棄（二十二）之爬蟲與反爬蟲大戰小白學 Python 爬蟲（9）：爬蟲基礎小白 Python 爬蟲部署 Linux Python爬蟲筆記(一):爬蟲基本入門小白學 Python 爬蟲（1）：開篇