主題式網絡爬蟲名稱——爬取中國天氣網數據

本文轉載自查看原文 2021-12-26 18:45 836

主題式網絡爬蟲名稱——爬取中國天氣網數據

選題背景

一天的天氣變化影響着方方面面，農業上，對天氣進行預測可以用來安排相應地工作和生活，特別是災害性的天氣預報，保護着人們的生命財產，促進經濟發展等方面發揮着重要作用。而python主題式網絡爬蟲能盡可能多的發現和搜集與預定主題相關的網頁，具備分析網頁內容和判別主題相關度的能力，通過對中國天氣網的爬取，獲取天氣變化數據數據。

主題式網絡爬蟲設計方案

（1）主題式網絡爬蟲名稱——爬取中國天氣網數據

（2）主題式網絡爬蟲爬取的內容

利用python爬取中國天氣網某個城市的一周時間內的天氣變化，最低溫，最高溫以及風級。

（3）數據特征分析

利用python自帶的多個第三方庫pandas、numpy和requests爬取庫、beautiful soup庫對數據進行統計和分析，比較某兩個關系之間的特征聯系。

（4）方案概述

獲取網頁請求

解析獲取的網頁

提取數據

保存文件

數據清洗

統計數據

HTML頁面解析

爬取的url為http://www.weather.com.cn

網址首頁如下，進入頁面后點擊鼠標右鍵，在選項中利用瀏覽器的審查功能，獲取頁面的元素和源代碼，也可以通過點擊頁面中某些關鍵信息，查看所需字段所在標簽位置。

從網頁源代碼可以看出，我們所需要的字段全部在id=“7d”中的div中的ul中 ,<ul> 標簽定義無序列表。 日期在標簽li中的h1標簽中，天氣情況在第一個p標簽中。 最高溫度在第二個p標簽的span 標簽中，最低溫度在第二個p標簽的 i 標簽中。 風級在第三個p標簽中的 i 標簽中。其結構如圖所示。

節點查找與遍歷方法

爬取網頁

import requests
from bs4 import BeautifulSoup
import csv
import json
def getHTMLtext(url):     
    """請求獲得網頁內容"""
    try:         
        r = requests.get(url, timeout = 30)         
        r.raise_for_status()         
        r.encoding = r.apparent_encoding         
        print("成功訪問")         
        return r.text     
    except:         
        print("訪問錯誤")         
        return" " 

def get_content(html):
    """處理得到有用信息保存數據文件"""
    list= []                               #初始化創建列表保存數據
    bs = BeautifulSoup(html, "html.parser")  #創建beautiful soup爬取對象
    body = bs.body
    data = body.find('div', {'id': '7d'})    #找到div標簽且id=7d
                            #爬取當天數據
    data2 = body.find_all('div',{'class':'left-div'})
    text = data2[2].find('script').string     
    text = text[text.index('=')+1 :-2]         #移除該var data=將其變為json數據
    jd = json.loads(text)
    dayone = jd['od']['od2']                 #找到當天數據
    list_day = []                            #存放當天數據
    count = 0
    for i in dayone:
        temp = []
        if count <=23:
            temp.append(i['od21'])                 #添加時間
            temp.append(i['od22'])                 #添加當前時刻溫度
            temp.append(i['od24'])                 #添加當前時刻風力方向
            temp.append(i['od25'])                 #添加當前時刻風力等級
            temp.append(i['od26'])                 #添加當前時刻降水量
            temp.append(i['od27'])                 #添加當前時刻相對濕度
            temp.append(i['od28'])                 #添加當前時刻控制質量
            #print(temp)
            list_day.append(temp)
        count = count +1
                                             #爬取七天數據
    ul = data.find('ul')                     #找到所有ul標簽
    li = ul.find_all('li')                   #找到左右的li標簽
    i = 0                                    #控制天數
    for day in li:                           #遍歷找到每個li
        if i < 7 and i > 0:
            temp = []                             #臨時存放每天數據
            date = day.find('h1').string          #得到日期
            date = date[0:date.index('日')]       #取出日期號
            temp.append(date)                        
            inf = day.find_all('p')               #找出li下面的p標簽，提取第一個p標簽的值
            temp.append(inf[0].string)

            tem_low = inf[1].find('i').string     #找到最低氣溫

            if inf[1].find('span') is None:       #天氣預報可能沒有最高氣溫
                tem_high = None
            else:
                tem_high = inf[1].find('span').string  #找到最高氣溫
            temp.append(tem_low[:-1])
            if tem_high[-1] == '℃':
                temp.append(tem_high[:-1])
            else:
                temp.append(tem_high)

            wind = inf[2].find_all('span')             #找到風向
            for j in wind:
                temp.append(j['title'])

            wind_scale = inf[2].find('i').string       #找到風級
            index1 = wind_scale.index('級')
               temp.append(int(wind_scale[index1-1:index1]))
            list.append(temp)
        i = i + 1
    return list_day,final
    #print(final)    
def get_content2(html):
    """處理得到有用信息保存數據文件"""
    list = []                                                       #初始化一個列表保存數據
    bs = BeautifulSoup(html, "html.parser")                         #創建beautiful soup對象
    body = bs.body
    data = body.find('div', {'id': '15d'})                          #找到div標簽且id=15d
    ul = data.find('ul')                                            #找到所有ul標簽
    li = ul.find_all('li')                                          #找到左右的li標簽
    list = []
    i = 0                                                             #控制天數
    for day in li:                                                  #遍歷找到每個li
        if i < 8:
            temp = []                                                #臨時存放數據
            date = day.find('span',{'class':'time'}).string          #得到日期
            date = date[date.index('（')+1:-2]                        #取出日期
            temp.append(date)        
               weather = day.find('span',{'class':'wea'}).string     #找到天氣
            temp.append(weather)
            tem = day.find('span',{'class':'tem'}).text             #找到溫度
            temp.append(tem[tem.index('/')+1:-1])                   #找到最低氣溫
            temp.append(tem[:tem.index('/')-1])                     #找到最高氣溫
            wind = day.find('span',{'class':'wind'}).string         #找到風向
            if '轉' in wind:                                        #如果有風向變化
                temp.append(wind[:wind.index('轉')])
                temp.append(wind[wind.index('轉')+1:])
            else:                                                   #若沒有風向變化，前后一致
                temp.append(wind)
                temp.append(wind)
            wind_scale = day.find('span',{'class':'wind1'}).string  #找到風級
            index1 = wind_scale.index('級')
               temp.append(int(wind_scale[index1-1:index1]))
                
            list.append(temp)
    return list

運行結果如下

保存數據

def write_to_csv(file_name, data, day=14):
    """保存為csv文件"""
    with open(file_name, 'a', errors='ignore', newline='') as f:
        if day == 14:
            header = ['日期','天氣','最低氣溫','最高氣溫','風向1','風向2','風級']
        else:
            header = ['小時','溫度','風力方向','風級','降水量','相對濕度','空氣質量']
        f_csv = csv.writer(f)
        f_csv.writerow(header)
        f_csv.writerows(data)

def main():
    """主函數"""
    print("Weather test")
    url1 = 'http://www.weather.com.cn/weather/101280701.shtml'    #7天天氣中國天氣網
    url2 = 'http://www.weather.com.cn/weather15d/101280701.shtml' #8-15天天氣中國天氣網
    
    html1 = getHTMLtext(url1)
    data1, data1_7 = get_content(html1)                           #獲得1-7天和當天的數據

    html2 = getHTMLtext(url2)
    data8_14 = get_content2(html2)                                #獲得8-14天的數據
    data14 = data1_7 + data8_14
    #print(data)
    write_to_csv('weather14.csv',data14,14)                      #保存為csv文件
    write_to_csv('weather1.csv',data1,1)

if __name__ == '__main__':
    main()

文件結果如下

數據清理和可視化

1、一天濕度變化數據可視化圖

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import math
data = pd.read_csv('weather1.csv',encoding='gb2312')
"""相對濕度曲線繪制"""
hour = list(data['小時'])
hum = list(data['相對濕度'])
for i in range(0,24):
    if math.isnan(hum[i]) == True:
        hum[i] = hum[i-1]
hum_ave = sum(hum)/24                      # 求平均相對濕度 
hum_max = max(hum)                
hum_max_hour = hour[hum.index(hum_max)]    # 求最高相對濕度
hum_min = min(hum)
hum_min_hour = hour[hum.index(hum_min)]    # 求最低相對濕度
x = []
y = []
for i in range(0, 24):
    x.append(i)
    y.append(hum[hour.index(i)])
plt.figure(2)
plt.plot(x,y,color='green',label='相對濕度')                            
plt.scatter(x,y,color='green')                                        # 點出每個時刻的相對濕度
plt.plot([0, 24], [hum_ave, hum_ave], c='red', linestyle='--',label='平均相對濕度')     
plt.text(hum_max_hour+0.2, hum_max+0.2, str(hum_max), ha='center', va='bottom', fontsize=10.5)  
plt.text(hum_min_hour+0.2, hum_min+0.2, str(hum_min), ha='center', va='bottom', fontsize=10.5)  
plt.xticks(x)
plt.legend()
plt.title('一天相對濕度變化曲線圖')
plt.xlabel('時間/h')
plt.ylabel('百分比/%')
plt.show()

結果如下

2、一天的溫度變化數據可視化圖

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import math
data = pd.read_csv('weather1.csv',encoding='gb2312')
"""溫度曲線繪制"""
hour = list(data['小時'])
tem = list(data['溫度'])
for i in range(0,24):
    if math.isnan(tem[i]) == True:
        tem[i] = tem[i-1]
tem_ave = sum(tem)/24                    # 求平均溫度 
tem_max = max(tem)                
tem_max_hour = hour[tem.index(tem_max)]    # 求最高溫度
tem_min = min(tem)
tem_min_hour = hour[tem.index(tem_min)]    # 求最低溫度
x = []
y = []
for i in range(0, 24):
    x.append(i)
    y.append(tem[hour.index(i)])
plt.figure(1)
plt.plot(x,y,color='green',label='溫度')                            # 畫出溫度曲線
plt.scatter(x,y,color='green')                                   # 點出每個時刻的溫度點
plt.plot([0, 24], [tem_ave, tem_ave], c='red', linestyle='--',label='平均溫度')     # 畫出平均溫度虛線
plt.text(tem_max_hour+0.2, tem_max+0.2, str(tem_max), ha='center', va='bottom', fontsize=10.5)  # 標出最高溫度
plt.text(tem_min_hour+0.2, tem_min+0.2, str(tem_min), ha='center', va='bottom', fontsize=10.5)  # 標出最低溫度
plt.xticks(x)
plt.legend()
plt.title('一天溫度變化曲線圖')
plt.xlabel('時間/h')
plt.ylabel('攝氏度/℃')
plt.show()

運行結果如下

3、一天的空氣質量數據可視化圖

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import math
data = pd.read_csv('weather1.csv',encoding='gb2312')
hour = list(data['小時'])
air = list(data['空氣質量'])
print(type(air[0]))
for i in range(0,24):
    if math.isnan(air[i]) == True:
        air[i] = air[i-1]
air_ave = sum(air)/24                    # 求平均空氣質量 
air_max = max(air)                
air_max_hour = hour[air.index(air_max)]    # 求最高空氣質量
air_min = min(air)
air_min_hour = hour[air.index(air_min)]    # 求最低空氣質量
x = []
y = []
for i in range(0, 24):
    x.append(i)
    y.append(air[hour.index(i)])
plt.figure(3)
    
for i in range(0,24):
    if y[i] <= 50:
        plt.bar(x[i],y[i],color='lightblue',width=0.9)  # 1等級
    elif y[i] <= 100:
        plt.bar(x[i],y[i],color='wheat',width=0.9)      # 2等級
    elif y[i] <= 150:
        plt.bar(x[i],y[i],color='red',width=0.9)        # 3等級
    elif y[i] <= 200:
        plt.bar(x[i],y[i],color='greenred',width=0.9)   # 4等級
    elif y[i] <= 300:
        plt.bar(x[i],y[i],color='darkviolet',width=0.9) # 5等級
    elif y[i] > 300:
        plt.bar(x[i],y[i],color='maroon',width=0.9)     # 6等級
plt.plot([0, 24], [air_ave, air_ave], c='black', linestyle='--')     
plt.text(air_max_hour+0.2, air_max+0.2, str(air_max), ha='center', va='bottom', fontsize=10.5)  
plt.text(air_min_hour+0.2, air_min+0.2, str(air_min), ha='center', va='bottom', fontsize=10.5)  
plt.title('一天空氣質量變化曲線圖')
plt.xlabel('時間/h')
plt.ylabel('空氣質量指數AQI')
plt.show()

運行結果如下

4、風力雷達圖

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import mathdef wind_radar(data):
    """風向雷達圖"""
    wind = list(data['風力方向'])
    wind_speed = list(data['風級'])
    for i in range(0,24):
        if wind[i] == "北風":
            wind[i] = 90
        elif wind[i] == "南風":
            wind[i] = 270
        elif wind[i] == "西風":
            wind[i] = 180
        elif wind[i] == "東風":
            wind[i] = 360
        elif wind[i] == "東北風":
            wind[i] = 45
        elif wind[i] == "西北風":
            wind[i] = 135
        elif wind[i] == "西南風":
            wind[i] = 225
        elif wind[i] == "東南風":
            wind[i] = 315
    degs = np.arange(45,361,45)
    temp = []
    for deg in degs:
        speed = []
                              # 獲取 wind_deg 在指定范圍的風速平均值數據
        for i in range(0,24):
            if wind[i] == deg:
                speed.append(wind_speed[i])
        if len(speed) == 0:
            temp.append(0)
        else:
            temp.append(sum(speed)/len(speed))
    print(temp)
    N = 8
    theta = np.arange(0.+np.pi/8,2*np.pi+np.pi/8,2*np.pi/8)
    # 數據極徑
    radii = np.array(temp)
    # 繪制極區圖坐標系
    plt.axes(polar=True)
    # 定義每個扇區的RGB值（R,G,B），x越大，對應的顏色越接近藍色
    colors = [(1-x/max(temp), 1-x/max(temp),0.6) for x in radii]
    plt.bar(theta,radii,width=(2*np.pi/N),bottom=0.0,color=colors)
    plt.title('一天風級圖',x=0.2,fontsize=20)
    plt.show()
wind_radar(data)

運行結果如下

14天的氣溫相關性分析圖

import numpy as np
import pandas as pd
import math
import matplotlib.pyplot as plt
data = pd.read_csv('weather14.csv',encoding='gb2312')
day = data['日期']
tem = data['最高氣溫']
plt.scatter(day, tem, color='green')
plt.title("溫度相關性分析圖")
plt.xlabel("日期")
plt.ylabel("溫度/℃")
plt.text(9, 20, "相關系數為：" + str(calc_corr(day, tem)), fontdict={'size': '10', 'color': 'red'})
plt.show()

運行結果如下

完整代碼如下

import requests
from bs4 import BeautifulSoup
import csv
import json
def getHTMLtext(url):     
    """請求獲得網頁內容"""
    try:         
        r = requests.get(url, timeout = 30)         
        r.raise_for_status()         
        r.encoding = r.apparent_encoding         
        print("成功訪問")         
        return r.text     
    except:         
        print("訪問錯誤")         
        return" " 

def get_content(html):
    """處理得到有用信息保存數據文件"""
    list = []                               # 初始化一個列表保存數據
    bs = BeautifulSoup(html, "html.parser")  # 創建BeautifulSoup對象
    body = bs.body
    data = body.find('div', {'id': '7d'})    # 找到div標簽且id = 7d
                                              # 下面爬取當天的數據
    data2 = body.find_all('div',{'class':'left-div'})
    text = data2[2].find('script').string     
    text = text[text.index('=')+1 :-2]         # 移除改var data=將其變為json數據
    jd = json.loads(text)
    dayone = jd['od']['od2']                   # 找到當天的數據
    final_day = []                             # 存放當天的數據
    count = 0
    for i in dayone:
        temp = []
        if count <=23:
            temp.append(i['od21'])                 # 添加時間
            temp.append(i['od22'])                 # 添加當前時刻溫度
            temp.append(i['od24'])                 # 添加當前時刻風力方向
            temp.append(i['od25'])                 # 添加當前時刻風級
            temp.append(i['od26'])                 # 添加當前時刻降水量
            temp.append(i['od27'])                 # 添加當前時刻相對濕度
            temp.append(i['od28'])                 # 添加當前時刻控制質量
            #print(temp)
            final_day.append(temp)
        count = count +1
                                              # 下面爬取7天的數據    
    ul = data.find('ul')                      
    li = ul.find_all('li')                    
    i = 0                                        # 控制爬取的天數
    for day in li:                               # 遍歷找到的每一個li
        if i < 7 and i > 0:
            temp = []                               # 臨時存放每天的數據
            date = day.find('h1').string           # 得到日期
            date = date[0:date.index('日')]         # 取出日期號
            temp.append(date)                        
            inf = day.find_all('p')               # 找出li下面的p標簽,提取第一個p標簽的值，即天氣
            temp.append(inf[0].string)

            tem_low = inf[1].find('i').string      # 找到最低氣溫

            if inf[1].find('span') is None:      # 天氣預報可能沒有最高氣溫
                tem_high = None
            else:
                tem_high = inf[1].find('span').string 
            temp.append(tem_low[:-1])
            if tem_high[-1] == '℃':
                temp.append(tem_high[:-1])
            else:
                temp.append(tem_high)

            wind = inf[2].find_all('span')        # 找到風向
            for j in wind:
                temp.append(j['title'])

            wind_scale = inf[2].find('i').string  # 找到風級
            index1 = wind_scale.index('級')
               temp.append(int(wind_scale[index1-1:index1]))
            list.append(temp)
        i = i + 1
    return list_day,final
    #print(list)    
def get_content2(html):
    """處理得到有用信息保存數據文件"""
    list = []                                                       
    bs = BeautifulSoup(html, "html.parser")                          # 創建BeautifulSoup對象
    body = bs.body
    data = body.find('div', {'id': '15d'})                             # 找到div標簽且id = 15d
    ul = data.find('ul')                                               # 找到所有的ul標簽
    li = ul.find_all('li')                                             # 找到左右的li標簽
    list = []
    i = 0                                                              # 控制爬取的天數
    for day in li:                                                     # 遍歷找到的每一個li
        if i < 8:
            temp = []                                                   # 臨時存放每天的數據
            date = day.find('span',{'class':'time'}).string         
            date = date[date.index('（')+1:-2]                          
            temp.append(date)        
               weather = day.find('span',{'class':'wea'}).string          
            temp.append(weather)
            tem = day.find('span',{'class':'tem'}).text                  
            temp.append(tem[tem.index('/')+1:-1])                    
            temp.append(tem[:tem.index('/')-1])                     
            wind = day.find('span',{'class':'wind'}).string          
            if '轉' in wind:                                            
                temp.append(wind[:wind.index('轉')])
                temp.append(wind[wind.index('轉')+1:])
            else:                                                       # 如果沒有風向變化，前后風向一致
                temp.append(wind)
                temp.append(wind)
            wind_scale = day.find('span',{'class':'wind1'}).string          # 找到風級
            index1 = wind_scale.index('級')
               temp.append(int(wind_scale[index1-1:index1]))
                
            list.append(temp)
    return list

def write_to_csv(file_name, data, day=14):
    """保存為csv文件"""
    with open(file_name, 'a', errors='ignore', newline='') as f:
        if day == 14:
            header = ['日期','天氣','最低氣溫','最高氣溫','風向1','風向2','風級']
        else:
            header = ['小時','溫度','風力方向','風級','降水量','相對濕度','空氣質量']
        f_csv = csv.writer(f)
        f_csv.writerow(header)
        f_csv.writerows(data)

def main():
    """主函數"""
    print("Weather test")
    url1 = 'http://www.weather.com.cn/weather/101280701.shtml'    # 7天天氣中國天氣網
    url2 = 'http://www.weather.com.cn/weather15d/101280701.shtml' # 8-15天天氣中國天氣網
    
    html1 = getHTMLtext(url1)
    data1, data1_7 = get_content(html1)        # 獲得1-7天和當天的數據

    html2 = getHTMLtext(url2)
    data8_14 = get_content2(html2)            # 獲得8-14天數據
    data14 = data1_7 + data8_14
    #print(data)
    write_to_csv('weather14.csv',data14,14)    # 保存為csv文件
    write_to_csv('weather1.csv',data1,1)

if __name__ == '__main__':
    main()

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import math
data = pd.read_csv('weather1.csv',encoding='gb2312')
"""溫度曲線繪制"""
hour = list(data['小時'])
tem = list(data['溫度'])
for i in range(0,24):
    if math.isnan(tem[i]) == True:
        tem[i] = tem[i-1]
tem_ave = sum(tem)/24                    # 求平均溫度 
tem_max = max(tem)                
tem_max_hour = hour[tem.index(tem_max)]    # 求最高溫度
tem_min = min(tem)
tem_min_hour = hour[tem.index(tem_min)]    # 求最低溫度
x = []
y = []
for i in range(0, 24):
    x.append(i)
    y.append(tem[hour.index(i)])
plt.figure(1)
plt.plot(x,y,color='green',label='溫度')                            # 畫出溫度曲線
plt.scatter(x,y,color='green')            # 點出每個時刻的溫度點
plt.plot([0, 24], [tem_ave, tem_ave], c='red', linestyle='--',label='平均溫度')     # 畫出平均溫度虛線
plt.text(tem_max_hour+0.2, tem_max+0.2, str(tem_max), ha='center', va='bottom', fontsize=10.5)  # 標出最高溫度
plt.text(tem_min_hour+0.2, tem_min+0.2, str(tem_min), ha='center', va='bottom', fontsize=10.5)  # 標出最低溫度
plt.xticks(x)
plt.legend()
plt.title('一天溫度變化曲線圖')
plt.xlabel('時間/h')
plt.ylabel('攝氏度/℃')
plt.show()

hum = list(data['相對濕度'])
for i in range(0,24):
    if math.isnan(hum[i]) == True:
        hum[i] = hum[i-1]
hum_ave = sum(hum)/24                    # 求平均相對濕度 
hum_max = max(hum)                
hum_max_hour = hour[hum.index(hum_max)]    # 求最高相對濕度
hum_min = min(hum)
hum_min_hour = hour[hum.index(hum_min)]    # 求最低相對濕度
x = []
y = []
for i in range(0, 24):
    x.append(i)
    y.append(hum[hour.index(i)])
plt.figure(2)
plt.plot(x,y,color='green',label='相對濕度')                            # 畫出相對濕度曲線
plt.scatter(x,y,color='green')            # 點出每個時刻的相對濕度
plt.plot([0, 24], [hum_ave, hum_ave], c='red', linestyle='--',label='平均相對濕度')     # 畫出平均相對濕度虛線
plt.text(hum_max_hour+0.15, hum_max+0.15, str(hum_max), ha='center', va='bottom', fontsize=10.5)  # 標出最高相對濕度
plt.text(hum_min_hour+0.15, hum_min+0.15, str(hum_min), ha='center', va='bottom', fontsize=10.5)  # 標出最低相對濕度
plt.xticks(x)
plt.legend()
plt.title('一天相對濕度變化曲線圖')
plt.xlabel('時間/h')
plt.ylabel('百分比/%')
plt.show()

air = list(data['空氣質量'])
print(type(air[0]))
for i in range(0,24):
    if math.isnan(air[i]) == True:
        air[i] = air[i-1]
air_ave = sum(air)/24                    # 求平均空氣質量 
air_max = max(air)                
air_max_hour = hour[air.index(air_max)]    # 求最高空氣質量
air_min = min(air)
air_min_hour = hour[air.index(air_min)]    # 求最低空氣質量
x = []
y = []
for i in range(0, 24):
    x.append(i)
    y.append(air[hour.index(i)])
plt.figure(3)
    
for i in range(0,24):
    if y[i] <= 50:
        plt.bar(x[i],y[i],color='lightblue',width=0.9)  # 1等級
    elif y[i] <= 100:
        plt.bar(x[i],y[i],color='wheat',width=0.9)         # 2等級
    elif y[i] <= 150:
        plt.bar(x[i],y[i],color='red',width=0.9)         # 3等級
    elif y[i] <= 200:
        plt.bar(x[i],y[i],color='greenred',width=0.9)     # 4等級
    elif y[i] <= 300:
        plt.bar(x[i],y[i],color='darkviolet',width=0.9)     # 5等級
    elif y[i] > 300:
        plt.bar(x[i],y[i],color='maroon',width=0.9)         # 6等級
plt.plot([0, 24], [air_ave, air_ave], c='black', linestyle='--')     # 畫出平均空氣質量虛線
plt.text(air_max_hour+0.2, air_max+0.2, str(air_max), ha='center', va='bottom', fontsize=10.5)  # 標出最高空氣質量
plt.text(air_min_hour+0.2, air_min+0.2, str(air_min), ha='center', va='bottom', fontsize=10.5)  # 標出最低空氣質量
plt.xticks(x)
plt.title('一天空氣質量變化曲線圖')
plt.xlabel('時間/h')
plt.ylabel('空氣質量指數AQI')
plt.show()

總結

經過對主題數據的分析與可視化，可以知道該城市在哪某個時間段溫度最高，溫度最低，以及平均溫度，也將濕度，空氣質量，風力分析了。從圖中也可以看出分布情況。

收獲：對於網絡爬取越來越了解，操作也越來越熟練。

難點及待改進：難點是在爬取時，在分析網頁的源代碼時，不能快速地、確切地找到自己想要的獲取的代碼行，以及對數據可視化的不熟練。

要改進的有在對數據進行具體分析時，沒有運用到函數。

小結

通過本學期的python課程學習，更加深入的了解到了python，體驗到了它的魅力，但自己對其掌握度還不過，練習的也不夠，且還需要對大數據分析，機械學習方面的課程進行深入學習和認識，在后續的學習中還需要加強學習和訓練。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 初識python 之爬蟲：爬取中國天氣網數據 Python爬取中國天氣網天氣數據爬蟲-通過本地IP地址從中國天氣網爬取當前城市天氣情況 Python爬取中國天氣網 python爬蟲爬取中國天氣網各城市天氣數據（柱狀圖展示和中國地圖展示）（pyquery+pyecharts ) [Python]網絡爬蟲爬取天氣數據、城市、日溫度、風向、風力、天氣 Python爬取天氣網歷史天氣數據 Python爬取天氣網歷史天氣數據爬取中國天氣網某城市一周天氣 Python爬蟲實戰，Scrapy實戰，爬取並簡單分析知網中國專利數據