編寫python爬蟲采集彩票網站數據,將數據寫入mongodb數據庫


1.准備工作:

1.1安裝requests: cmd >> pip install requests
1.2 安裝lxml: cmd >>  pip install lxml
1.3安裝wheel: cmd >>  pip install wheel
1.4 安裝xlwt: cmd >> pip install xlwt
1.5 安裝pymongo: cmd >> pip install pymongo

完整代碼
import requests
from lxml import etree
import xlwt
from pymongo import MongoClient

#設置瀏覽器的請求頭,告訴服務器我們是從瀏覽器來的,作用是阻止被網站反爬
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36',
    'Accept-Encoding': 'gzip, deflate',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3',
    'Connection': 'keep-alive'
}

# 創建數據庫
client = MongoClient()
database = client['Chapter6']
collection = database['webdata']



for i in range(1, 21):
    url = "http://kaijiang.zhcw.com/zhcw/html/3d/list_{}.html".format(i)
    #發送請求 得到數據
    response = requests.get(url=url,headers=headers)
    #print(response.text)

    #將數據改成xpath結構
    res_xpath = etree.HTML(response.text)
    trs = res_xpath.xpath('/html/body/table//tr')


    # 將數據寫入MongoDB數據庫
    for tr in trs[2:-1]:
        data = {
            '開獎日期': tr.xpath("./td[1]/text()")[0],
            '期號': tr.xpath("./td[2]/text()")[0],
            '中獎號碼1': tr.xpath("./td[3]/em[1]/text()")[0],
            '中獎號碼2': tr.xpath("./td[3]/em[2]/text()")[0],
            '中獎號碼3': tr.xpath("./td[3]/em[3]/text()")[0],
            '銷售額(元)': tr.xpath("./td[4]/text()")[0],
            '返獎比例': tr.xpath("./td[5]/text()")[0]
        }
        collection.insert_one(data);

實現效果

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM