周末了，八一八推薦博客的排名

本文轉載自查看原文 2014-03-09 00:42 7712 大數據

作者：Vamei 出處：http://www.cnblogs.com/vamei 歡迎轉載，也請保留這段聲明。謝謝！

最近挖煤君上了博客園推薦博客的排行榜，甚是高興。看着自己名次上升，是件很開心的事情。

看着推薦榜上的諸位大神，再加上QQ群里的交流，我骨子里的八卦精神又一次發癢，所以就做了個爬蟲，把推薦榜上各位的入園時間、粉絲數、排名給搜羅了一下，做成一個泡泡圖。

看起來，排名不是完全由粉絲數決定的，但也有相當大的相關性。大伙的眼睛是雪亮的啊。

看到肥嘟嘟沉淀在下面的諸位大神，挖煤君表示由衷敬佩。

大家來找自己的泡泡吧。Vamei躲在右下角的小角落哦！

挖煤君的小小爬蟲是Python寫的，圖是D3.js畫的。歡迎點贊留言加粉兒哦，挖煤君也想向下沉淀沉淀。

泡泡的面積和粉絲數成正比，x軸為入園時間，y軸為排名。

D3老的瀏覽器可能不支持。Chrome效果最佳。各位能給我反饋一下不同瀏覽器的效果如何？

2014.03.08數據更新

Python爬蟲代碼：

#-*- coding: UTF-8 -*-
# By Vamei
# scrape the cnblogs

import requests
import BeautifulSoup

import re
import json
from datetime import datetime

def read_page(url, method="get"):
    '''
    read the html page via the URL
    '''
    status_code = 0
    while status_code != 200:
        if method == "get":
            r = requests.get(url)
        elif method == "post":
            r = requests.post(url)
        status_code = r.status_code
        print status_code
    page = r.content
    return page

def parse_person_profile(relative, info={}):
    '''
    retrieve the information from the personal profile page
    '''

    r = read_page("http://home.cnblogs.com/u%s" % relative)
    soup  = BeautifulSoup.BeautifulSoup(r)

    # the count of the followers
    el            = soup.find("a", {'id':"follower_count"})
    info['粉絲數']   =  int(el.getText())

    # the time of the registration
    el       = soup.find("div", {'id': "ctl00_cphMain_panel_profile"})
    profile  =  el.ul
    reg_time =  el.ul.findChildren()[0]
    raw = reg_time.getText()
    m   = re.findall("(\d+)", raw)
    m   =  map(int, m)
    dt  = datetime(year=m[0], month=m[1], day=m[2])
    info['開博時間'] = dt.strftime("%Y%m%d")
    return info

def cnblogs_recommend_150():
    '''
    workhouse
    '''
    url = "http://www.cnblogs.com/aggsite/ExpertBlogs"
    r = read_page(url, method="post")
    soup = BeautifulSoup.BeautifulSoup(r)

    # retrieve the information blogger by blogger
    info = []
    anchors = soup.findAll('a')
    # blogger by blogger
    for i, a in enumerate(anchors):
        name = a.getText()
        p_info = {'昵稱': name, '排名': i + 1}
        # parse_person_main(a['href'], p_info)
        parse_person_profile(a['href'], p_info)

        info.append(p_info)

    # write the retrieved data into the file
    with open("info", "w") as f:
        rlt = json.dumps(info, indent=4, encoding="UTF-8", ensure_ascii=False)
        f.write(rlt.encode("utf8"))
    return info

if __name__ == "__main__":
    info = cnblogs_recommend_150()

Javascript代碼所在位置。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 八一八招聘的那些事兒【SEO】周末了，為了紀念明天上班，我們來一起看看SEO吧 PageRank 計算博客園用戶排名博客園積分與排名長期記錄個人博客站點推薦（一） IT技術網站博客推薦推薦的博客清單博客主題推薦——復雜 | 簡單【分享】博客美化(7)推薦幾個優秀的自定義博客博客園自定義推薦