獲取博客積分排名,存入數據庫,讀取數據進行繪圖(python,selenium,matplotlib)


      該腳本的目的:獲取博客的排名和積分,將抓取時間,排名,積分存入數據庫,然后把最近的積分和排名信息進行繪圖,查看積分或者排名的變化情況。

      整個腳本的流程:是利用python3來編寫,利用selnium獲取網頁的信息,使用re正則表達式解析積分score和排名rank,用pymysql連接mysql數據庫,最后利用matplotlib進行繪圖。

  首先創建db: xiaoshitou

  創建表blog_rank: 

CREATE TABLE `blog_rank` (
`id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'id',
`rank` varchar(255) NOT NULL DEFAULT '' COMMENT '排名',
`score` varchar(255) NOT NULL DEFAULT '' COMMENT '積分',
`create_time` varchar(255) NOT NULL DEFAULT '' COMMENT '添加時間',
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=27 DEFAULT CHARSET=utf8;

  現在來看下繪圖的結果:

    數據庫表,blog_rank表中存的數據:

    下面就來看實現過程:

    1、該文件是利用pymysql來連接數據庫,新增和查詢數據的(operation_mysql.py)

#coding=utf-8

import pymysql as MySQLdb
import datetime

host = '127.0.0.1'
user = 'root'
passwd = '123456'
port = 3306
db = 'xiaoshitou'
class OperationMySQL(object):

    def __init__(self):
        """連接數據庫"""
        try:
            self.conn = MySQLdb.connect(host=host,
                               port=port,
                               user=user,
                               passwd=passwd,
                               db=db,
                               charset='utf8', )
            self.cur = self.conn.cursor()
        except Exception as e:
            print('Connect MySQL Database Fail: ' + e)

    def _close_connect(self):
        """關閉連接"""
        self.cur.close()
        self.conn.close()

    def insert_data(self, data):
        """插入數據"""
        sql = 'insert into blog_rank (rank,score,create_time) values ({0},{1},{2})'.format(data['rank'], data['score'], datetime.datetime.now().timestamp())
        res = self.cur.execute(sql)
        self.conn.commit()
        self._close_connect()

    def select_data(self, sql=None):
      """根據sql查詢數據"""
      if sql is None:
        sql = 'select rank,score,create_time from blog_rank order by create_time'
      self.cur.execute(sql)
      result = self.cur.fetchall()
      self._close_connect()
      headers = ('rank', 'score', 'create_time')
      results = [dict(zip(headers, row)) for row in result]
      # print(results)
      return results


if __name__ == '__main__':
    OperationMySQL().select_data()

  2、get_my_blog_score.py,這個文件包含:獲取網頁內容,解析排名和積分,將抓取的數據存入數據庫,讀取數據庫進行繪圖

# coding=utf-8
try:
    import requests
except:
    import os
    os.system('pip install requests')
    import requests
import re
from selenium import webdriver
from time import sleep
from operation_mysql import OperationMySQL


class GetMyBlogScore:
    """獲取博客園積分和排名"""
    def __init__(self):
        pass

    def _get_blog_content(self):
        """獲取博客的頁面內容"""
        url = "http://www.cnblogs.com/xiaoshitoutest"
        driver = webdriver.Firefox()
        sleep(1)
        driver.get(url)
        sleep(1)
        self.content = driver.page_source
        driver.quit()
        return
        
    def _match_content(self, compile_str_args):
        """進行匹配內容"""
        compile_str = re.compile(compile_str_args)
        result = compile_str.findall(self.content)
        final_str = re.sub(r'\D', '', result[0])
        return final_str

    def _save_database(self, data):
        """將結果寫入數據庫"""
        if isinstance(data, dict) and data is not None:
            OperationMySQL().insert_data(data)
            print('Insert Data Success.')
        else:
            print('The data is invalid.')

    def _show_map(self):
        """讀取數據庫中的值,畫圖表,保存結果"""
        datas = OperationMySQL().select_data()
        import matplotlib.pyplot as plt
        from datetime import datetime
        from matplotlib.dates import datestr2num,DateFormatter
        import matplotlib.dates as dates

        x_ = [ datetime.fromtimestamp(float(x['create_time'])).strftime('%Y-%m-%d %H:%M:%S') for x in datas]
        score = [x['score'] for x in datas]
        rank = [x['rank'] for x in datas]

        plt.rcParams['font.sans-serif'] = ['FangSong']

        fig, ax = plt.subplots()
        ax.xaxis.set_major_locator(dates.DayLocator())
        ax.xaxis.set_major_formatter(DateFormatter('%Y-%m-%d'))

        ax.plot_date(datestr2num(x_),score,'--')
        ax.set_xlabel('日期')
        ax.set_ylabel('積分')
        ax.set_title('博客園排名--積分')
        fig.autofmt_xdate()
        # plt.show()
        plt.savefig('./rank_score.png')


    def run(self):
        score = r'<li.*?class="liScore">([\s\S]*?)</li>'
        rank = r'<li.*?class="liRank">([\s\S]*?)</li>'
        self._get_blog_content()
        scores = self._match_content(score)
        ranks = self._match_content(rank)
        result = dict(zip(['score', 'rank'], [scores, ranks]))
        self._save_database(result)
        self._show_map()


if __name__ == '__main__':
    GetMyBlogScore().run()

  直接運行該文件,就會在當前目錄下生成一個rank_score.png的圖片,就是關於積分的變化圖。

  開始那張是:時間--積分的繪圖,我在放一張。積分--排名變化圖


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM