Python3---BeautifulSoup---節點選擇器


代碼部分:

 1 from bs4 import BeautifulSoup
 2 
 3 #下面代碼示例都是用此文檔測試
 4 html_doc = """
 5 <html><head><title>The Dormouse's story</title></head>
 6 <body>
 7 <p class="title"><b>The Dormouse's story</b></p>
 8 
 9 <p class="story">Once upon a time there were three little sisters; and their names were
10 <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
11 <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
12 <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
13 and they lived at the bottom of a well.</p>
14 
15 <p class="story">...</p>
16 """
17 soup = BeautifulSoup(html_doc,'lxml')
18 print("1;獲取head標簽")
19 print(soup.head)
20 print("2;#獲取p節點下的b節點")
21 print(soup.p.b)
22 #name屬性獲取節點名稱:
23 print("4;name屬性獲取節點名稱")
24 print(soup.body.name)
25 #attrs屬性獲取節點屬性,也可以字典的形式直接獲取,返回的結果可能是列表或字符串類型,取決於節點類型
26 print("5;獲取p節點所有屬性")
27 print(soup.p.attrs)
28 print("6;獲取p節點class屬性")
29 print(soup.p.attrs['class'])
30 print("7;直接獲取p節點class屬性")
31 print(soup.p['class'])
32 #string屬性獲取節點元素包含的文本內容:
33 print("8;獲取a標簽下的文本,只獲取第一個")
34 print(soup.p.string)
35 #contents屬性獲取節點的直接子節點,以列表的形式返回內容
36 print("9;contents屬性獲取節點的直接子節點,以列表的形式返回內容")
37 print(soup.body.contents)
38 #children屬性獲取的也是節點的直接子節點,只是以生成器的類型返回
39 print("10;children屬性獲取的也是節點的直接子節點,只是以生成器的類型返回")
40 print(soup.body.children)
41 #descendants屬性獲取子孫節點,返回生成器
42 print("11;descendants屬性獲取子孫節點,返回生成器")
43 print(soup.body.descendants)
44 #parent屬性獲取父節點,parents獲取祖先節點,返回生成器
45 print("12;parent屬性獲取父節點,parents獲取祖先節點,返回生成器")
46 print(soup.b.parent)
47 print(soup.b.parents)
48 #next_sibling屬性返回下一個兄弟節點
49 print("13;next_sibling屬性返回下一個兄弟節點")
50 print(soup.a.next_sibling)
51 #previous_sibling返回上一個兄弟節點,注意換行符也是一個節點
52 print("14;previous_sibling返回上一個兄弟節點,注意換行符也是一個節點")
53 print(soup.a.previous_sibling)
54 #next_siblings屬性返回下所有兄弟節點
55 print("15;next_sibling屬性返回下一個兄弟節點")
56 print(soup.a.next_siblings)
57 #previous_siblings返回上所有兄弟節點,注意換行符也是一個節點
58 print("16;previous_sibling返回上一個兄弟節點,注意換行符也是一個節點")
59 print(soup.a.previous_siblings)
60 #next_element和previous_element屬性獲取下一個被解析的對象,或者上一個
61 print("17;next_element和previous_element屬性獲取下一個被解析的對象,或者上一個")
62 print(soup.a.next_element)
63 print(soup.a.previous_element)
64 #next_elements和previous_elements迭代器向前或者后訪問文檔解析內容
65 print("18;next_elements和previous_elements迭代器向前或者后訪問文檔解析內容")
66 print(soup.a.next_elements)
67 print(soup.a.previous_elements)

運行結果:

/home/aaron/桌面/Python3-Test/venv/bin/python /home/aaron/桌面/Python3-Test/bs4-study.py
1;獲取head標簽
<head><title>The Dormouse's story</title></head>
2;#獲取p節點下的b節點
<b>The Dormouse's story</b>
4;name屬性獲取節點名稱
body
5;獲取p節點所有屬性
{'class': ['title']}
6;獲取p節點class屬性
['title']
7;直接獲取p節點class屬性
['title']
8;獲取a標簽下的文本,只獲取第一個
The Dormouse's story
9;contents屬性獲取節點的直接子節點,以列表的形式返回內容
['\n', <p class="title"><b>The Dormouse's story</b></p>, '\n', <p class="story">Once upon a time there were three little sisters; and their names were
<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> and
<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>, '\n', <p class="story">...</p>, '\n']
10;children屬性獲取的也是節點的直接子節點,只是以生成器的類型返回
<list_iterator object at 0x7f0b1bd17750>
11;descendants屬性獲取子孫節點,返回生成器
<generator object Tag.descendants at 0x7f0b19e17d50>
12;parent屬性獲取父節點,parents獲取祖先節點,返回生成器
<p class="title"><b>The Dormouse's story</b></p>
<generator object PageElement.parents at 0x7f0b19e17d50>
13;next_sibling屬性返回下一個兄弟節點
,

14;previous_sibling返回上一個兄弟節點,注意換行符也是一個節點
Once upon a time there were three little sisters; and their names were

15;next_sibling屬性返回下一個兄弟節點
<generator object PageElement.next_siblings at 0x7f0b19e17d50>
16;previous_sibling返回上一個兄弟節點,注意換行符也是一個節點
<generator object PageElement.previous_siblings at 0x7f0b19e17d50>
17;next_element和previous_element屬性獲取下一個被解析的對象,或者上一個
Elsie
Once upon a time there were three little sisters; and their names were

18;next_elements和previous_elements迭代器向前或者后訪問文檔解析內容
<generator object PageElement.next_elements at 0x7f0b19e17d50>
<generator object PageElement.previous_elements at 0x7f0b19e17d50>

Process finished with exit code 0


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM