Python網絡爬蟲 - 一個簡單的爬蟲例子


下面我們創建一個真正的爬蟲例子

爬取我的博客園個人主頁首頁的推薦文章列表和地址

scrape_home_articles.py

from urllib.request import urlopen
from bs4 import BeautifulSoup
import re

html = urlopen("http://www.cnblogs.com/davidgu")
bsObj = BeautifulSoup(html, "html.parser")
for link in bsObj.find("div", {"id":"main_container"}).findAll("a", href=re.compile("^http://www.cnblogs.com/davidgu/p")):
    if 'href' in link.attrs and not('class' in link.attrs):
        print(link.string)
        print(link.attrs['href'])
        print("--------------------------------------------------------------")

運行結果:
[置頂]解決adb server端口被占用的問題
http://www.cnblogs.com/davidgu/p/4515236.html
--------------------------------------------------------------
[置頂]解決Eclipse下不自動拷貝apk到模擬器問題( The connection to adb is down, and a sever
http://www.cnblogs.com/davidgu/p/4390661.html
--------------------------------------------------------------
常用的正則表達式一覽
http://www.cnblogs.com/davidgu/p/4831357.html
--------------------------------------------------------------
C++ 11 - STL - 函數對象(Function Object) (上)
http://www.cnblogs.com/davidgu/p/4829097.html
--------------------------------------------------------------

...

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM