話不多說,分析一波csdn的閱讀數,計數原理是每次進入頁面記作一次,所以我們很簡單的構建一個訪問的小爬蟲就好了,那么開始操作。
1 import requests 2 import time 3 from lxml import etree 4 import random 5 6 def post_article(): 7 '''下面url換成自己的,獲取自己所有博客的鏈接''' 8 response = requests.get(url='me_url',headers = getHeaders()) 9 text = response.content.decode('utf-8') 10 html = etree.HTML(text) 11 urls = html.xpath('//h4/a/@href') 12 for url in urls: 13 article_url.append(url) 14 15 def access_url(): 16 '''訪問其中一個url,隨機從自己的博客中選中進行訪問''' 17 try: 18 url = random.choice(article_url) 19 response = requests.get(url, headers=getHeaders()) 20 time.sleep(2) 21 except Exception as e : 22 print(e)
根據上面的代碼,你的博客閱讀數會蹭蹭的上漲,唉,想想都淚奔,要靠這種,
我們下面寫一下注意的就可以,設置headers,還有睡眠時間等,頻繁的訪問會使服務器拒絕為你增加閱讀數,you ok?(散裝英語).
再加上我們設置的headers:
1 def getHeaders(): 2 user_agent_list = [ \ 3 "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/22.0.1207.1 Safari/537.1" \ 4 "Mozilla/5.0 (X11; CrOS i686 2268.111.0) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.57 Safari/536.11", \ 5 "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.6 (KHTML, like Gecko) Chrome/20.0.1092.0 Safari/536.6", \ 6 "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.6 (KHTML, like Gecko) Chrome/20.0.1090.0 Safari/536.6", \ 7 "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/19.77.34.5 Safari/537.1", \ 8 "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.9 Safari/536.5", \ 9 "Mozilla/5.0 (Windows NT 6.0) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.36 Safari/536.5", \ 10 "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3", \ 11 "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3", \ 12 "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_0) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1063.0 Safari/536.3", \ 13 "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1062.0 Safari/536.3", \ 14 "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1062.0 Safari/536.3", \ 15 "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3", \ 16 "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3", \ 17 "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1061.1 Safari/536.3", \ 18 "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/536.3 (KHTML, like Gecko) Chrome/19.0.1061.0 Safari/536.3", \ 19 "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.24 (KHTML, like Gecko) Chrome/19.0.1055.1 Safari/535.24", \ 20 "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/535.24 (KHTML, like Gecko) Chrome/19.0.1055.1 Safari/535.24" 21 ] 22 UserAgent = random.choice(user_agent_list) 23 headers = {'User-Agent': UserAgent} 24 return headers
主程序代碼塊:
1 if __name__ == '__main__': 2 index = 0 3 post_article() 4 print('進行到這了。。。') 5 while True: 6 access_url() 7 print(index) 8 index += 1 9 '''自己隨意設計的次數''' 10 if index == 100000: 11 break
這個小爬蟲就這么出來了,不要過度使用,只為學習技術,有任何糾紛跟我無關(瑟瑟發抖)。