第一個python爬蟲程序

本文轉載自查看原文 2017-04-05 17:07 2245 爬蟲/ python

1.安裝Python環境

官網https://www.python.org/下載與操作系統匹配的安裝程序，安裝並配置環境變量

2.IntelliJ Idea安裝Python插件

我用的idea，在工具中直接搜索插件並安裝（百度）

3.安裝beautifulSoup插件

https://www.crummy.com/software/BeautifulSoup/bs4/doc.zh/#attributes

4.爬蟲程序：爬博客園的閃存內容

#!/usr/bin/python
# -*- coding: UTF-8 -*-
import urllib2
import time
import bs4

'''ing.cnblogs.com爬蟲類'''
class CnBlogsSpider:

    url = "https://ing.cnblogs.com/ajax/ing/GetIngList?IngListType=All&PageIndex=${pageNo}&PageSize=30&Tag=&_="

    #獲取html
    def getHtml(self):
        request = urllib2.Request(self.pageUrl)
        response = urllib2.urlopen(request)
        self.html = response.read()

    #解析html
    def analyze(self):
        self.getHtml()
        bSoup = bs4.BeautifulSoup(self.html)
        divs = bSoup.find_all("div",class_='ing-item')
        for div in divs:
            img = div.find("img")['src']
            item = div.find("div",class_='feed_body')
            userName = item.find("a",class_='ing-author').text
            text = item.find("span",class_='ing_body').text
            pubtime = item.find("a",class_='ing_time').text
            star = item.find("img",class_='ing-icon') and True or False
            print '( 頭像: ',img,'昵稱: ',userName,',閃存: ',text,',時間: ',pubtime,',星星： ',star,')'

    def run(self,page):
        pageNo = 1
        while (pageNo <= page):
            self.pageUrl = self.url.replace('${pageNo}', str(pageNo))+str(int(time.time()))
            print '-------------\r\n第 ',pageNo,' 頁的數據如下：',self.pageUrl
            self.analyze()
            pageNo = pageNo + 1

CnBlogsSpider().run(3)

5.執行結果

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 python爬蟲（二）——第一個爬蟲程序 python爬蟲__第一個爬蟲程序我的第一個爬蟲程序：利用Python抓取網頁上的信息開始第一個自己的python爬蟲程序爬磁力鏈第一個Python程序【網絡爬蟲學習】第一個Python爬蟲程序 & 編碼與解碼詳解 & Pythonの實現我的第一個Python爬蟲——談心得 python教程（二）·第一個python程序 Python——第一個python程序helloworld Python 爬蟲3——第一個爬蟲腳本的創建