BeautifulSoup的基本用法

本文轉載自查看原文 2022-03-30 10:08 730 python

Beautiful Soup 是一個可以從HTML或XML文件中提取數據的Python庫.它能夠通過你喜歡的轉換器實現慣用的文檔導航,查找,修改文檔的方式。

它是一個靈活又方便的網頁解析庫，處理高效，支持多種解析器。

利用它就不用編寫正則表達式也能方便的實現網頁信息的抓取。

通常人們把 beautifulSoup 叫作“美味的湯，綠色的濃湯”，簡稱：美麗(味)湯

它的官方文檔：https://www.crummy.com/software/BeautifulSoup/bs4/doc/index.zh.html (中)

https://www.crummy.com/software/BeautifulSoup/bs4/doc/ (英)

安裝

快速安裝

pip install beautifulsoup4 或 easy_install BeautifulSoup4

解析庫

Beautiful Soup支持Python標准庫中的HTML解析器,還支持一些第三方的解析器，如果我們不安裝它，則 Python 會使用 Python默認的解析器，lxml 解析器更加強大，速度更快，推薦安裝。

# -*- coding:utf-8 -*-

from bs4 import BeautifulSoup

html = """
<html><head><title>haha,The Dormouse's story</tittle></head>
<body>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
"""

soup = BeautifulSoup(html, 'lxml')
print(soup)    # 輸出解析的html對象
print(soup.prettify())      # 格式化
print(soup.title)             # 輸出標題<title>，eg: <title>haha,The Dormouse's story</title>
print(soup.title.string)      # 輸出title標題的內容字符串
print(soup.title.parent.name) # 輸出<title>節點父節點的名字

print(soup.find_all('a')) # 輸出所有標簽<a>組成的list
print(soup.find(id='link3')) # 返回包含id='link3'的標簽所有內容

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 BeautifulSoup 的用法 BeautifulSoup的基本用法 BeautifulSoup的基本用法 BeautifulSoup基本用法 python爬蟲---BeautifulSoup的用法 python BeautifulSoup庫用法總結記錄BeautifulSoup中soup.select的用法 beautifulSoup基本用法及find選擇器 python爬蟲beautifulsoup查找定位Select用法 beautifulsoup用法2 (find_all select)