python爬蟲獲取疫情信息並存入mysql數據庫實踐


    上一次做了全國疫情統計可視化圖表,這次嘗試着能不能實現數據庫里的更新操作,首先考慮的就是python爬蟲,因為它易操作,並且python學習也是日后必須的。

    通過從網上查閱學習,代碼如下:

import requests
from bs4 import BeautifulSoup
import re
import pymysql
import json


def create():
    db = pymysql.connect("localhost", "root", "0000", "grabdata_test",charset='utf8')  # 連接數據庫

    cursor = db.cursor()
    cursor.execute("DROP TABLE IF EXISTS info")

    sql = """CREATE TABLE info (
            Id INT PRIMARY KEY AUTO_INCREMENT,
            Date varCHAR(255),
            Province varchar(255),
            City varchar(255),
            Confirmed_num varchar(255),
            Yisi_num varchar(255),
            Cured_num varchar(255),
            Dead_num varchar(255),
            Code varchar(255))"""

    cursor.execute(sql)

    db.close()


def insert(value):
    db = pymysql.connect("localhost", "root", "0000", "grabdata_test",charset='utf8')

    cursor = db.cursor()
    sql = "INSERT INTO info(Date,Province,City,Confirmed_num,Yisi_num,Cured_num,Dead_num,Code) VALUES ( %s,%s,%s,%s,%s,%s,%s,%s)"
    try:
        cursor.execute(sql, value)
        db.commit()
        print('插入數據成功')
    except:
        db.rollback()
        print("插入數據失敗")
    db.close()


create()  # 創建表

url = 'https://raw.githubusercontent.com/BlankerL/DXY-2019-nCoV-Data/master/json/DXYArea.json'
response = requests.get(url)
# 將響應信息進行json格式化
versionInfo = response.text
# print(versionInfo)#打印爬取到的數據
# print("------------------------")#重要數據分割線↓

#一個從文件加載,一個從內存加載#json.load(filename)#json.loads(string)
jsonData = json.loads(versionInfo)

#用於存儲數據的集合
dataSource = []
provinceShortNameList = []
confirmedCountList = []
curedCount = []
deadCountList = []
#遍歷對應的數據存入集合中
for k in range(len(jsonData['results'])):
    if(jsonData['results'][k]['countryName'] == '中國'):
        provinceShortName = jsonData['results'][k]['provinceName']
        if("待明確地區" == provinceShortName):
            continue;

        for i in range(len(jsonData['results'][k]['cities'])):
            confirmnum=jsonData['results'][k]['cities'][i]['confirmedCount']
            yisi_num=jsonData['results'][k]['cities'][i]['suspectedCount']
            cured_num=jsonData['results'][k]['cities'][i]['curedCount']
            dead_num=jsonData['results'][k]['cities'][i]['deadCount']
            code=jsonData['results'][k]['cities'][i]['locationId']
            cityname=jsonData['results'][k]['cities'][i]['cityName']
            date='2020-3-10'
            insert((date,provinceShortName,cityname,confirmnum,yisi_num,cured_num,dead_num,code))

這次爬取的是https://raw.githubusercontent.com/BlankerL/DXY-2019-nCoV-Data/master/json/DXYArea.json網站上的疫情信息,pycharm運行結果:

 

 我們再來看看數據庫里的信息:

 

     我們可以看到數據庫里已經成功導入數據了!

     接着我們嘗試讓他可視化,套用上一次的圖表,顯示結果如下:

至此,本次python爬蟲實踐算是成功了!老淚縱橫。。。

在pycharm的使用過程中遇到了諸多問題和bug,哭遼,把我的辛酸史寫在下一篇博客里吧55555~~~


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM