Python爬虫小白入门（十二）Python 爬虫 – 根据id与class查找标签

本文转载自查看原文 2020-06-22 11:08 2740 python之爬虫

本章介绍怎么根据id与class查找标签。假设有下面的HTML文档：

<html>
<head>
<title>A simple example page</title>
</head>
<body>
<div>
<p class="inner-text first-item" id="first">
First paragraph.
</p>
<p class="inner-text">
Second paragraph.
</p>
</div>
<p class="outer-text first-item" id="second">
<b>
First outer paragraph.
</b>
</p>
<p class="outer-text">
<b>
Second outer paragraph.
</b>
</p>
</body>
</html>

可以通过URL https://kevinhwu.github.io/demo/python-scraping/simple2.html 访问上面的文档。让我们先下载页面并创建一个BeautifulSoup对象:

import requests
from bs4 import BeautifulSoup

page = requests.get("https://kevinhwu.github.io/demo/python-scraping/simple2.html")
soup = BeautifulSoup(page.content, 'html.parser')

根据class查找标签

根据id与class查找标签，使用的仍旧是find_all方法。下面的例子，查找类是outer-text的p标签：

soup.find_all('p', class_='outer-text')

输出

[<p class="outer-text first-item" id="second">
<b>
First outer paragraph.
</b>
</p>, <p class="outer-text">
<b>
Second outer paragraph.
</b>
</p>]

在下面的例子中，查找任何类是outer-text的标签:

soup.find_all(class_="outer-text")

输出

[<p class="outer-text first-item" id="second">
<b>
First outer paragraph.
</b>
</p>, <p class="outer-text">
<b>
Second outer paragraph.
</b>
</p>]

根据id查找标签

另外，也可以通过id查找标签:

[<p class="inner-text first-item" id="first">
First paragraph.
</p>]

输出

[<p class="inner-text first-item" id="first">
First paragraph.
</p>]

免责声明！

本站转载的文章为个人学习借鉴使用，本站对版权不负任何法律责任。如果侵犯了您的隐私权益，请联系本站邮箱yoyou2525@163.com删除。

猜您在找 Python爬虫小白入门（二）requests库 Python 爬虫从入门到进阶之路（十二） Python爬虫小白入门（三）BeautifulSoup库小白学 Python 爬虫（34）：爬虫框架 Scrapy 入门基础（二）小白学 Python 爬虫（33）：爬虫框架 Scrapy 入门基础（一） Python爬虫从入门到放弃（二十二）之爬虫与反爬虫大战小白学 Python 爬虫（9）：爬虫基础小白 Python 爬虫部署 Linux Python爬虫笔记(一):爬虫基本入门小白学 Python 爬虫（1）：开篇