Python爬蟲常用之PyQuery

本文轉載自查看原文 2017-03-27 16:41 3328

PyQuery是解析頁面常用的庫.是python對jquery的封裝.
下面是一份解析基本頁面的代碼.后期用到復雜或者實用的方式再增加.

 1 from pyquery import PyQuery as pq
 2 
 3 
 4 # 參數為字符串的情況
 5 html_str = "<html></html>"
 6 
 7 # 參數為網頁鏈接（需帶 http：//）
 8 your_url = "http://www.baidu.com"
 9 
10 # 參數為文件
11 path_to_html_file = "hello123.html"
12 
13 # 將參數傳入pq庫之后得到html頁面
14 # d = pq(html_str)
15 # d = pq(etree.fromstring(html_str))
16 # d = pq(url=your_url)
17 # d = pq(url=your_url,
18 #        opener=lambda url, **kw: urlopen(url).read())
19 d = pq(filename=path_to_html_file)
20 
21 # 此時的'd'相當於Jquery的'$',選擇器,可以通過標簽,id,class等選擇元素
22 
23 # 通過id選擇
24 table = d("#my_table")
25 
26 # 通過標簽選擇
27 head = d("head")
28 
29 # 通過樣式選擇,多個樣式寫一起,使用逗號隔開即可
30 p = d(".p_font")
31 
32 # 獲取標簽內的文本
33 text = p.text()
34 print text
35 
36 # 獲取標簽的屬性值
37 t_class = table.attr('class')
38 print t_class
39 
40 # 遍歷標簽內的選項
41 # 打印表格中的td中的文字
42 for item in table.items():
43     # 這個循環只循環一次,item仍然是pquery的對象
44     print item.text()
45 
46 for item in table('td'):
47     # 這個循環循環多次,item是html的對象
48     print item.text

用於測試的html代碼:

 1 
 2  <head>
 3 <title>Test</title>  4 </head>  5 <body>  6 <h1>Parse me!</h1>  7 <img src = "" />  8 <p>A paragraph.</p>  9 <p class = "p_font">A paragraph with class.</p> 10 <!-- comment --> 11 <div> 12 <p>A paragraph in div.</p> 13 </div> 14 <table id = "my_table" class = "test-table"> 15 <thead> 16 </thead> 17 <tbody> 18 <tr> 19 <td>Month</td> 20 <td>Savings</td> 21 </tr> 22 <tr> 23 <td>January</td> 24 <td>$100</td> 25 </tr> 26 </tbody> 27 </table> 28 </body> 29 </html>

分析html的結果輸出如下:

A paragraph with class.
test-table
Month Savings January $100
Month
Savings
January
$100

由於使用python2,有的網頁使用requests直接抓取下來放入pyquery()里面會出編碼問題,這時使用unicode()轉換一下即可.部分代碼如下:

import requests
from pyquery import PyQuery as pq

r = requests.get('http://www.baidu.com')
# d = pq(r.content)
u = unicode(r.content, 'utf-8')
d = pq(u)

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 python爬蟲之PyQuery的基本使用 Python爬蟲常用之HtmlParser python爬蟲之pyquery學習 Python爬蟲利器六之PyQuery的用法 Python爬蟲學習筆記（六）——BeautifulSoup和pyquery的使用 Python爬蟲常用之登錄(二) 瀏覽器模擬登錄 Python爬蟲常用之登錄(三) 使用http請求登錄 python爬蟲從入門到放棄（七）之 PyQuery庫的使用 python3 爬蟲之Pyquery的使用方法 python爬蟲常用之Scrapy 中間件