BeautifulSoup4庫和CSS選擇器

本文轉載自查看原文 2018-12-27 00:45 758 Spider

BeautifulSoup4

1.安裝和文檔

2.主要的解析器

3.簡單使用

4.常用方法示例find_all()...

5.區分小知識點

CSS選擇器

7 select和css選擇器提取元素示例

練習：中國天氣網爬蟲之所有城市數據爬取

BeautifulSoup4

和 lxml 一樣，Beautiful Soup 也是一個HTML/XML的解析器，主要的功能也是如何解析和提取 HTML/XML 數據。

lxml 只會局部遍歷，而Beautiful Soup 是基於HTML DOM（Document Object Model 即文檔對象模型）的，會載入整個文檔，解析整個DOM樹，因此時間和內存開銷都會大很多，所以性能要低於lxml。

BeautifulSoup 用來解析 HTML 比較簡單，API非常人性化，支持CSS選擇器、Python標准庫中的HTML解析器，也支持 lxml 的 XML解析器。

Beautiful Soup 3 目前已經停止開發，推薦現在的項目使用Beautiful Soup 4。使用 pip 安裝即可：pip install beautifulsoup4

官方文檔：http://beautifulsoup.readthedocs.io/zh_CN/v4.4.0

抓取工具	速度	使用難度	安裝難度
正則	最快	困難	無（內置）
BeautifulSoup	慢	最簡單	簡單
lxml	快	簡單	一般

1.安裝和文檔：

1 安裝： pip install bs4
2 中文文檔：https://www.crummy.com/software/BeautifulSoup/bs4/doc/index.zh.html

安裝解析器：

Beautiful Soup支持Python標准庫中的HTML解析器,還支持一些第三方的解析器,其中一個是 lxml .根據操作系統不同,可以選擇下列方法來安裝lxml:

$ apt-get install Python-lxml

$ easy_install lxml

$ pip install lxml

另一個可供選擇的解析器是純Python實現的 html5lib , html5lib的解析方式與瀏覽器相同,可以選擇下列方法來安裝html5lib:

$ apt-get install Python-html5lib

$ easy_install html5lib

$ pip install html5lib

2.下表列出了主要的解析器,以及它們的優缺點:

解析器	使用方法	優勢	劣勢
Python標准庫	`BeautifulSoup(markup, "html.parser")`	Python的內置標准庫執行速度適中文檔容錯能力強	Python 2.7.3 or 3.2.2)前的版本中文檔容錯能力差
lxml HTML 解析器	`BeautifulSoup(markup, "lxml")`	速度快文檔容錯能力強	需要安裝C語言庫
lxml XML 解析器	`BeautifulSoup(markup, ["lxml", "xml"])` `BeautifulSoup(markup, "xml")`	速度快唯一支持XML的解析器	需要安裝C語言庫
html5lib	`BeautifulSoup(markup, "html5lib")`	最好的容錯性以瀏覽器的方式解析文檔生成HTML5格式的文檔	速度慢不依賴外部擴展

3.簡單使用：

#encoding: utf-8

from bs4 import BeautifulSoup

html = """
<table class="tablelist" cellpadding="0" cellspacing="0">
    <tbody>
        <tr class="h">
            <td class="l" width="374">職位名稱</td>
            <td>職位類別</td>
            <td>人數</td>
            <td>地點</td>
            <td>發布時間</td>
        </tr>
        ...
"""
# pip install lxml
bs = BeautifulSoup(html,"lxml")
# 使用美化的方式輸出
print(bs.prettify())

4.常用方法示例：

find_all()　　返回值是列表　　a['href'] a.attrs['href']　　.strings　　.stripped_strings　　.get_text()

# 首先必須要導入 bs4 庫
# 1. 獲取所有tr標簽
# 2. 獲取第2個tr標簽
# 3. 獲取所有class等於even的tr標簽
# 4. 將所有id等於test，class也等於test的a標簽提取出來。
# 5. 獲取所有a標簽的href屬性
# 6. 獲取所有的職位信息（純文本）
from bs4 import BeautifulSoup


text = """
    <table class="tablelist" cellpadding="0" cellspacing="0">
    <tbody>
        <tr class="h">
            <td class="l" width="374">職位名稱</td>
            <td>職位類別</td>
            <td>人數</td>
            <td>地點</td>
            <td>發布時間</td>
        </tr>
        <tr class="even">
            <td class="l square"><a target="_blank" href="position_detail.php?id=33824&keywords=python&tid=87&lid=2218">22989-金融雲區塊鏈高級研發工程師（深圳）</a></td>
            <td>技術類</td>
            <td>1</td>
            <td>深圳</td>
            <td>2017-11-25</td>
        </tr>
        <tr class="odd">
            <td class="l square"><a target="_blank" href="position_detail.php?id=29938&keywords=python&tid=87&lid=2218">22989-金融雲高級后台開發</a></td>
            <td>技術類</td>
            <td>2</td>
            <td>深圳</td>
            <td>2017-11-25</td>
        </tr>
        <tr class="even">
            <td class="l square"><a target="_blank" href="position_detail.php?id=31236&keywords=python&tid=87&lid=2218">SNG16-騰訊音樂運營開發工程師（深圳）</a></td>
            <td>技術類</td>
            <td>2</td>
            <td>深圳</td>
            <td>2017-11-25</td>
        </tr>
        <tr class="odd">
            <td class="l square"><a target="_blank" href="position_detail.php?id=31235&keywords=python&tid=87&lid=2218">SNG16-騰訊音樂業務運維工程師（深圳）</a></td>
            <td>技術類</td>
            <td>1</td>
            <td>深圳</td>
            <td>2017-11-25</td>
        </tr>
        <tr class="even">
            <td class="l square"><a target="_blank" href="position_detail.php?id=34531&keywords=python&tid=87&lid=2218">TEG03-高級研發工程師（深圳）</a></td>
            <td>技術類</td>
            <td>1</td>
            <td>深圳</td>
            <td>2017-11-24</td>
        </tr>
        <tr class="odd">
            <td class="l square"><a target="_blank" href="position_detail.php?id=34532&keywords=python&tid=87&lid=2218">TEG03-高級圖像算法研發工程師（深圳）</a></td>
            <td>技術類</td>
            <td>1</td>
            <td>深圳</td>
            <td>2017-11-24</td>
        </tr>
        <tr class="even">
            <td class="l square"><a target="_blank" href="position_detail.php?id=31648&keywords=python&tid=87&lid=2218">TEG11-高級AI開發工程師（深圳）</a></td>
            <td>技術類</td>
            <td>4</td>
            <td>深圳</td>
            <td>2017-11-24</td>
        </tr>
        <tr class="odd">
            <td class="l square"><a target="_blank" href="position_detail.php?id=32218&keywords=python&tid=87&lid=2218">15851-后台開發工程師</a></td>
            <td>技術類</td>
            <td>1</td>
            <td>深圳</td>
            <td>2017-11-24</td>
        </tr>
        <tr class="even">
            <td class="l square"><a target="_blank" href="position_detail.php?id=32217&keywords=python&tid=87&lid=2218">15851-后台開發工程師</a></td>
            <td>技術類</td>
            <td>1</td>
            <td>深圳</td>
            <td>2017-11-24</td>
        </tr>
        <tr class="odd">
            <td class="l square"><a target="_blank" href="position_detail.php?id=34511&keywords=python&tid=87&lid=2218">SNG11-高級業務運維工程師（深圳）</a></td>
            <td>技術類</td>
            <td>1</td>
            <td>深圳</td>
            <td>2017-11-24</td>
        </tr>
    </tbody>
</table>
"""

# htmlElement = etree.HTML(text)
# print(etree.tostring(htmlElement,encoding='utf-8').decode('utf-8'))
from bs4.element import Tag
# 1. 獲取所有tr標簽
# 2. 獲取第2個tr標簽
# 3. 獲取所有class等於even的tr標簽
# 4. 將所有id等於test，class也等於test的a標簽提取出來。
# 5. 獲取所有a標簽的href屬性
# 6. 獲取所有的職位信息（純文本）
soup = BeautifulSoup(text,'lxml')
# print(soup.prettify())

# 1. 獲取所有tr標簽
# trs = soup.find_all('tr')
# for tr in trs:
#     # 返回的類型為 <class 'bs4.element.Tag'>
#     # from bs4.element import Tag 其中Tag中實現了__repr__方法來打印字符串
#     print(type(tr),tr)
#     if tr != trs[-1]:
#         print('*'*30)

# 2. 獲取第2個tr標簽 , 返回值是個列表所以要指定下標
# trs = soup.find_all('tr',limit=2)[1]
# print(trs)

# 3. 獲取所有class等於even的tr標簽 , 注意：class_ 后面有下划線這是為了區分python原有的關鍵字class
# trs = soup.find_all('tr',class_='even')
# print(trs)
# # 或者可以：
# trs = soup.find_all('tr',attrs={'class':'even'})

# 4. 將所有id等於test，class也等於test的a標簽提取出來。
# aList = soup.find_all('a',id='test',class_='test')
# print(aList)
# aList = soup.find_all('a',attrs={'id':'test','class':'test'})
# print(aList)

# 5. 獲取所有a標簽的href屬性
# aList = soup.find_all('a')
# for a in aList:
#     # 1.通過下標操作的方式來獵取
#     # href = a['href']
#     # print(href)
#     # 2.通過attrs屬性的方式
#     href = a.attrs['href']
#     print(href)

# 6. 獲取所有的職位信息（純文本）
trs = soup.find_all('tr')
movies = []
for tr in trs:
    movie = {}
    # # 1 使用.string方法
    # tds = tr.find_all('td')
    # title = tds[0].string
    # movie['title'] = title
    # movies.append(movie)
    # print(title)

    # 2 使用stripped_strings
    info = list(tr.stripped_strings)
    print(info)

5.區分小知識點：

## find_all的使用：
1. 在提取標簽的時候，第一個參數是標簽的名字。然后如果在提取標簽的時候想要使用標簽屬性進行過濾，那么可以在這個方法中通過關鍵字參數的形式，將屬性的名字以及對應的值傳進去。或者是使用`attrs`屬性，將所有的屬性以及對應的值放在一個字典中傳給`attrs`屬性。
2. 有些時候，在提取標簽的時候，不想提取那么多，那么可以使用`limit`參數。限制提取多少個。

## find與find_all的區別：
1. find：找到第一個滿足條件的標簽就返回。說白了，就是只會返回一個元素。
2. find_all:將所有滿足條件的標簽都返回。說白了，會返回很多標簽（以列表的形式）。

## 使用find和find_all的過濾條件：
1. 關鍵字參數：將屬性的名字作為關鍵字參數的名字，以及屬性的值作為關鍵字參數的值進行過濾。
2. attrs參數：將屬性條件放到一個字典中，傳給attrs參數。

## 獲取標簽的屬性：
1. 通過下標獲取：通過標簽的下標的方式。
    ```python
    href = a['href']
    ```
2. 通過attrs屬性獲取：示例代碼：
    ```python
    href = a.attrs['href']
    ```

## string和strings、stripped_strings屬性以及get_text方法：
1. string：獲取某個標簽下的非標簽字符串。返回來的是個字符串。如果這個標簽下有多行字符，那么就不能獲取到了。
2. strings：獲取某個標簽下的子孫非標簽字符串。返回來的是個生成器。
2. stripped_strings：獲取某個標簽下的子孫非標簽字符串，會去掉空白字符。返回來的是個生成器。
4. get_text()：獲取某個標簽下的子孫非標簽字符串。不是以列表的形式返回，是以普通字符串返回。

CSS選擇器

這就是另一種與 find_all 方法有異曲同工之妙的查找方法.

寫 CSS 時，標簽名不加任何修飾，類名前加.，id名前加#
在這里我們也可以利用類似的方法來篩選元素，用到的方法是 soup.select()，返回類型是 list

（1）通過標簽名查找

print soup.select('title') 
#[<title>The Dormouse's story</title>]

print soup.select('a')
#[<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

print soup.select('b')
#[<b>The Dormouse's story</b>]

（2）通過類名查找

print soup.select('.sister')
#[<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

（3）通過 id 名查找

print soup.select('#link1')
#[<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>]

（4）組合查找

組合查找即和寫 class 文件時，標簽名與類名、id名進行的組合原理是一樣的，例如查找 p 標簽中，id 等於 link1的內容，二者需要用空格分開

print soup.select('p #link1')
#[<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>]

直接子標簽查找，則使用 > 分隔

print soup.select("head > title")
#[<title>The Dormouse's story</title>]

（5）屬性查找

print soup.select('a[class="sister"]')
#[<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]

print soup.select('a[href="http://example.com/elsie"]')
#[<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>]

同樣，屬性仍然可以與上述查找方式組合，不在同一節點的空格隔開，同一節點的不加空格

print soup.select('p a[href="http://example.com/elsie"]')
#[<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>]

(6) 獲取內容

以上的 select 方法返回的結果都是列表形式，可以遍歷形式輸出，然后用 get_text() 方法來獲取它的內容。

soup = BeautifulSoup(html, 'lxml')
print type(soup.select('title'))
print soup.select('title')[0].get_text()

for title in soup.select('title'):
    print title.get_text()

(7)select和css選擇器提取元素示例：

使用select提取出的元素如果是Tag標簽，可以使用屬性a['herf'] 和名稱提取，還可以使用.string .stripped_strings .strings .get_text()

#encoding: utf-8

from bs4 import BeautifulSoup

html = """
<table class="tablelist" cellpadding="0" cellspacing="0">
    <tbody>
        <tr class="h">
            <td class="l" width="374">職位名稱</td>
            <td>職位類別</td>
            <td>人數</td>
            <td>地點</td>
            <td>發布時間</td>
        </tr>
        <tr class="even">
            <td class="l square"><a target="_blank" href="position_detail.php?id=33824&keywords=python&tid=87&lid=2218">22989-金融雲區塊鏈高級研發工程師（深圳）</a></td>
            <td>技術類</td>
            <td>1</td>
            <td>深圳</td>
            <td>2017-11-25</td>
        </tr>
        <tr class="odd">
            <td class="l square"><a target="_blank" href="position_detail.php?id=29938&keywords=python&tid=87&lid=2218">22989-金融雲高級后台開發</a></td>
            <td>技術類</td>
            <td>2</td>
            <td>深圳</td>
            <td>2017-11-25</td>
        </tr>
        <tr class="even">
            <td class="l square"><a target="_blank" href="position_detail.php?id=31236&keywords=python&tid=87&lid=2218">SNG16-騰訊音樂運營開發工程師（深圳）</a></td>
            <td>技術類</td>
            <td>2</td>
            <td>深圳</td>
            <td>2017-11-25</td>
        </tr>
        <tr class="odd">
            <td class="l square"><a target="_blank" href="position_detail.php?id=31235&keywords=python&tid=87&lid=2218">SNG16-騰訊音樂業務運維工程師（深圳）</a></td>
            <td>技術類</td>
            <td>1</td>
            <td>深圳</td>
            <td>2017-11-25</td>
        </tr>
        <tr class="even">
            <td class="l square"><a target="_blank" href="position_detail.php?id=34531&keywords=python&tid=87&lid=2218">TEG03-高級研發工程師（深圳）</a></td>
            <td>技術類</td>
            <td>1</td>
            <td>深圳</td>
            <td>2017-11-24</td>
        </tr>
        <tr class="odd">
            <td class="l square"><a target="_blank" href="position_detail.php?id=34532&keywords=python&tid=87&lid=2218">TEG03-高級圖像算法研發工程師（深圳）</a></td>
            <td>技術類</td>
            <td>1</td>
            <td>深圳</td>
            <td>2017-11-24</td>
        </tr>
        <tr class="even">
            <td class="l square"><a target="_blank" href="position_detail.php?id=31648&keywords=python&tid=87&lid=2218">TEG11-高級AI開發工程師（深圳）</a></td>
            <td>技術類</td>
            <td>4</td>
            <td>深圳</td>
            <td>2017-11-24</td>
        </tr>
        <tr class="odd">
            <td class="l square"><a target="_blank" href="position_detail.php?id=32218&keywords=python&tid=87&lid=2218">15851-后台開發工程師</a></td>
            <td>技術類</td>
            <td>1</td>
            <td>深圳</td>
            <td>2017-11-24</td>
        </tr>
        <tr class="even">
            <td class="l square"><a target="_blank" href="position_detail.php?id=32217&keywords=python&tid=87&lid=2218">15851-后台開發工程師</a></td>
            <td>技術類</td>
            <td>1</td>
            <td>深圳</td>
            <td>2017-11-24</td>
        </tr>
        <tr class="odd">
            <td class="l square"><a id="test" class="test" target='_blank' href="position_detail.php?id=34511&keywords=python&tid=87&lid=2218">SNG11-高級業務運維工程師（深圳）</a></td>
            <td>技術類</td>
            <td>1</td>
            <td>深圳</td>
            <td>2017-11-24</td>
        </tr>
    </tbody>
</table>
"""

# 1. 獲取所有tr標簽
# 2. 獲取第2個tr標簽
# 3. 獲取所有class等於even的tr標簽
# 4. 獲取所有a標簽的href屬性
# 5. 獲取所有的職位信息（純文本）

soup = BeautifulSoup(html,'lxml')

# 1. 獲取所有tr標簽
# trs = soup.select("tr")
# for tr in trs:
#     print(type(tr))
#     print("="*30)
#     break

# 2. 獲取第2個tr標簽
# tr = soup.select('tr')[1]
# print(tr)

# 3. 獲取所有class等於even的tr標簽
# trs = soup.select(".even")
# trs = soup.select("tr[class='even']")
# for tr in trs:
#     print(tr)


# 4. 獲取所有a標簽的href屬性
# aList = soup.select('a')
# for a in aList:
#     href = a['href']
#     print(href)

# 5. 獲取所有的職位信息（純文本）
trs = soup.select('tr')
for tr in trs:
    infos = list(tr.stripped_strings)
    print(infos)

View Code

四大對象種類

Beautiful Soup將復雜HTML文檔轉換成一個復雜的樹形結構,每個節點都是Python對象,所有對象可以歸納為4種:

Tag：BeautifulSoup中所有的標簽都是Tag類型，並且BeautifulSoup的對象其實本質上也是一個Tag類型。所以其實一些方法比如find、find_all並不是BeautifulSoup的，而是Tag的。
NavigableString：繼承自python中的str，用起來就跟使用python的str是一樣的。
BeautifulSoup：繼承自Tag。用來生成BeaufifulSoup樹的。對於一些查找方法，比如find、select這些，其實還是Tag的。
Comment：這個也沒什么好說，就是繼承自NavigableString。

contents和children：
返回某個標簽下的直接子元素，其中也包括字符串。他們兩的區別是：contents返回來的是一個列表，children返回的是一個迭代器。

.contents 和 .children 屬性僅包含tag的直接子節點，.descendants 屬性可以對所有tag的子孫節點進行遞歸循環，和 children類似，我們也需要遍歷獲取其中的內容。

for child in soup.descendants:
    print child

運行結果：

<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>,
<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> and
<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
</body></html>
<head><title>The Dormouse's story</title></head>
<title>The Dormouse's story</title>
The Dormouse's story


<body>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>,
<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> and
<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
</body>


<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<b>The Dormouse's story</b>
The Dormouse's story


<p class="story">Once upon a time there were three little sisters; and their names were
<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>,
<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> and
<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
Once upon a time there were three little sisters; and their names were

<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>
 Elsie 
,

<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>
Lacie
 and

<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
Tillie
;
and they lived at the bottom of a well.


<p class="story">...</p>
...

中國天氣網爬蟲之所有城市數據爬取

基於BeautifulSoup進行查詢，分析HTML結構，使用pyechars進行展示（對map函數進行使用，得到所需列表，然后進行排序）

#encoding: utf-8

import requests
from bs4 import BeautifulSoup
from pyecharts import Bar

ALL_DATA = []

def parse_page(url):
    headers = {
        'User-Agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36"
    }
    response = requests.get(url,headers=headers)
    text = response.content.decode('utf-8')
    # html5lib
    # pip install html5lib
    soup = BeautifulSoup(text,'html5lib')
    conMidtab = soup.find('div',class_='conMidtab')
    tables = conMidtab.find_all('table')
    for table in tables:
        trs = table.find_all('tr')[2:]
        for index,tr in enumerate(trs):
            tds = tr.find_all('td')
            city_td = tds[0]
            if index == 0:
                city_td = tds[1]
            city = list(city_td.stripped_strings)[0]
            temp_td = tds[-2]
            min_temp = list(temp_td.stripped_strings)[0]
            ALL_DATA.append({"city":city,"min_temp":int(min_temp)})
            # print({"city":city,"min_temp":int(min_temp)})

def main():
    urls = [
        'http://www.weather.com.cn/textFC/hb.shtml',
        'http://www.weather.com.cn/textFC/db.shtml',
        'http://www.weather.com.cn/textFC/hd.shtml',
        'http://www.weather.com.cn/textFC/hz.shtml',
        'http://www.weather.com.cn/textFC/hn.shtml',
        'http://www.weather.com.cn/textFC/xb.shtml',
        'http://www.weather.com.cn/textFC/xn.shtml',
        'http://www.weather.com.cn/textFC/gat.shtml'
    ]
    for url in urls:
        parse_page(url)

    # 分析數據
    # 根據最低氣溫進行排序
    ALL_DATA.sort(key=lambda data:data['min_temp'])

    data = ALL_DATA[0:10]
    cities = list(map(lambda x:x['city'],data))
    temps = list(map(lambda x:x['min_temp'],data))
    # pyecharts
    # pip install pyecharts
    chart = Bar("中國天氣最低氣溫排行榜")
    chart.add('',cities,temps)
    chart.render('temperature.html')


if __name__ == '__main__':
    main()
    # ALL_DATA = [
    #     {"city": "北京", 'min_temp': 0},
    #     {"city": "天津", 'min_temp': -8},
    #     {"city": "石家庄", 'min_temp': -10}
    # ]
    #
    # ALL_DATA.sort(key=lambda data:data['min_temp'])
    # print(ALL_DATA)

代碼

返回頂部

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 beautifulsoup之CSS選擇器 BeautifulSoup 基本選擇器，標准選擇器，css選擇器 python爬蟲——BeautifulSoup詳解（附加css選擇器） BeautifulSoup4 庫的基本使用 beautifulSoup基本用法及find選擇器 CSS選擇器 < ~ + beautifulSoup基本用法及find選擇器 CSS選擇器 CSS選擇器 css的各種選擇器