01、博客爬蟲

本文轉載自查看原文 2019-04-11 20:33 759 Python練習冊

你需要爬取的是博客【人人都是蜘蛛俠】中，《未來已來（四）——Python學習進階圖譜》的所有文章評論，並且打印。

文章URL: https://wordpress-edu-3autumn.localprod.forc.work/all-about-the-future_04/

 1 #1、博客爬蟲
 2 #    你需要爬取的是博客【人人都是蜘蛛俠】中，《未來已來（四）——Python學習進階圖譜》的所有文章評論，並且打印。
 3 #    文章URL:https://wordpress-edu-3autumn.localprod.forc.work/all-about-the-future_04/
 4 import requests
 5 from bs4 import BeautifulSoup
 6 res = requests.get('https://wordpress-edu-3autumn.localprod.forc.work/all-about-the-future_04/')
 7 html = res.text
 8 soup = BeautifulSoup(html,'html.parser')
 9 items = soup.find_all('div',class_='comment-content')
10 for item in items:
11     print(item.find('p').text)
12 
13 '''
14 執行結果如下：
15     測試評論
16     我們就是
17     minu
18     kpi
19 '''
20 
21 '''
22 #   下面是老師的代碼
23 
24 #   調用requests庫
25 import requests
26 #   調用BeautifulSoup庫
27 from bs4 import BeautifulSoup
28 #   把網址復制給變量destnation_url
29 url_destnation = 'https://wordpress-edu-3autumn.localprod.forc.work/all-about-the-future_04/'
30 #   返回一個response對象，賦值給destnation
31 res_comment = requests.get (url_destnation)
32 #   把網頁解析為BeautifulSoup對象
33 bs_comment = BeautifulSoup(res_comment.text,'html.parser')
34 #   通過匹配屬性提取出我們想要的元素
35 list_comments = bs_comment.find_all('div',class_= 'comment-content')
36 #   遍歷列表，取出列表中的每一個值
37 for tag_comment in list_comments:
38 #   打印評論的文本
39     print(tag_comment.text)
40 '''

items中每個Tag的內容如下

1 <div class="comment-content">
2 <p>第1個蜘蛛俠</p>
3 </div>

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 爬蟲入門——01 C語言|博客作業01 C語言1博客作業01 C語言1博客作業01 C語言|博客作業01 爬蟲入門到放棄系列01：什么是爬蟲實用爬蟲-01-檢測爬蟲的 IP Python爬蟲-01：爬蟲的概念及分類 C博客01--順序、分支結構 C語言II博客作業01