爬蟲find()和find_all()遇到的問題集合

本文轉載自查看原文 2020-11-17 00:05 707 python爬蟲

from bs4 import BeautifulSoup
lxml 以lxml形式解析html，例：BeautifulSoup(html,'lxml') # 注：html5lib 容錯率最高
find 返回找到的第一個標簽
find_all 以list的形式返回找到的所有標簽
limit 指定返回的標簽個數
attrs 將標簽屬性放到一個字典中
string 獲取標簽下的非標簽字符串(值), 返回字符串
strings 獲取標簽下的所有非標簽字符串，返回生成器。
stripped_strings 獲取標簽下的所有非標簽字符串，並剔除空白字符，返回生成器。
get_text # 獲取標簽下的所有非標簽字符串,返回字符串格式
contents、children都是返回某個標簽下的直接子元素，包含字符串。 contents 返回一個列表，children 返回一個生成器

記錄第一個問題 .text

soup.find_all().text

報錯內容：
AttributeError: ResultSet object has no attribute ‘text’. You’re probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?

當不是一個單獨的對象的時候不能使用.text方法

想要返回第二個標簽的內容

方法一：通過limit可指定返回的標簽數量

p = soup.find_all("p",limit=2)[1] # 從列表中獲取第二個元素，limit 獲取標簽個數
print(p.text)

方法二：獲取class的p標簽

d = soup.find(class_="wy_contMain fontSt")
p =d.find("p")

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 python3爬蟲（find_all用法等） python爬蟲時如何使用find和find_all的講解 python爬蟲：BeautifulSoup庫find_all ()、find()方法詳解初識python 之爬蟲：BeautifulSoup 的 find、find_all、select 方法 find 和 find_all 用法 python3爬蟲03（find_all用法等） BeautifulSoup4的find_all()和select()，簡單爬蟲學習 python爬蟲（1）——BeautifulSoup庫函數find_all() (轉) BeautifulSoup中的find，find_all find()和find_all()的具體使用