python3.4學習筆記(十四) 網絡爬蟲實例代碼，抓取新浪愛彩雙色球開獎數據實例

本文轉載自查看原文 2015-07-03 14:23 15782 python3/ 網絡爬蟲/ 爬蟲/ python/ 抓取網頁/ 字符串截取

新浪愛彩雙色球開獎數據URL：http://zst.aicai.com/ssq/openInfo/

最終輸出結果格式如：2015075期開獎號碼：6,11,13,19,21,32, 藍球：4

直接用python源碼寫的抓取雙色球最新開獎數據的代碼，沒使用框架，直接用字符串截取的方式寫的，經過測試速度還是很快的

使用pyspider可以輕松分析出需要的內容，不過需要部署框架對只抓取特定內容的小應用來說也沒多大必要
一般的抓取網頁的使用 beautifulsoup就足夠了，pyspider真正做爬蟲類的應用才需要用到

python3.4學習筆記(十七) 網絡爬蟲使用Beautifulsoup4抓取內容 - 流風，飄然的風 - 博客園
http://www.cnblogs.com/zdz8207/p/python_learn_note_17.html

使用BeautifulSoup4對比直接使用字符串查找截取的方式要更加直觀和簡潔。

把代碼作為開源項目了，熱血狂徒 / zyspider - 代碼托管 - 開源中國社區
http://git.oschina.net/coos/zyspider

====================================

 1 import urllib.request
 2 import urllib.parse
 3 import re
 4 import urllib.request,urllib.parse,http.cookiejar
 5 
 6 def getHtml(url):
 7     cj=http.cookiejar.CookieJar()
 8     opener=urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
 9     opener.addheaders=[('User-Agent','Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.101 Safari/537.36'),('Cookie','4564564564564564565646540')]
10 
11     urllib.request.install_opener(opener)
12     
13     html_bytes = urllib.request.urlopen( url ).read()
14     html_string = html_bytes.decode( 'utf-8' )
15     return html_string
16 
17 #url = http://zst.aicai.com/ssq/openInfo/
18 #最終輸出結果格式如：2015075期開獎號碼：6,11,13,19,21,32, 藍球：4
19 html = getHtml("http://zst.aicai.com/ssq/openInfo/")
20 #<table class="fzTab nbt"> </table>
21 
22 table = html[html.find('<table class="fzTab nbt">') : html.find('</table>')]
23 #print (table)
24 #<tr onmouseout="this.style.background=''" onmouseover="this.style.background='#fff7d8'">
25 #<tr \r\n\t\t                  onmouseout=
26 tmp = table.split('<tr \r\n\t\t                  onmouseout=',1)
27 #print(tmp)
28 #print(len(tmp))
29 trs = tmp[1]
30 tr = trs[: trs.find('</tr>')]
31 #print(tr)
32 number = tr.split('<td   >')[1].split('</td>')[0]
33 print(number + '期開獎號碼：',end='')
34 redtmp = tr.split('<td  class="redColor sz12" >')
35 reds = redtmp[1:len(redtmp)-1]#去掉第一個和最后一個沒用的元素
36 #print(reds)
37 for redstr in reds:
38     print(redstr.split('</td>')[0] + ",",end='')
39 print('藍球：',end='')
40 blue = tr.split('<td  class="blueColor sz12" >')[1].split('</td>')[0]
41 print(blue)

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 python3.4學習筆記(十七) 網絡爬蟲使用Beautifulsoup4抓取內容 python3.4學習筆記(十一) 列表、數組實例 python3.4學習筆記(二十五) Python 調用mysql redis實例代碼 python3.4學習筆記(二十六) Python 輸出json到文件,讓json.dumps輸出中文實例代碼 python3.4學習筆記(二十三) Python調用淘寶IP庫獲取IP歸屬地返回省市運營商實例代碼 Python_網絡爬蟲（新浪新聞抓取） python3.4學習筆記(七) 學習網站博客推薦 python3.4學習筆記(三) idle 清屏擴展插件【轉】Python爬蟲：抓取新浪新聞數據