中國天氣網數據獲取

本文轉載自查看原文 2019-07-02 23:11 1022 爬蟲

# 中國天氣網   
# 練習使用 BeautifulSoup 解析
# 數據可視化

import requests
from bs4 import BeautifulSoup
import html5lib
from pyecharts import Bar

ALL_DATA = []

def parse_page(url):
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36"
    }
    response = requests.get(url=url,headers=headers)
    text =response.content.decode("utf-8")
    
    # 實例化soup對象  主要使用find,find_all定位  獲取標簽下所有文字stripped_strings
    # 坑 1
    # html5lib 主要是因為 港澳台url里面的數據格式跟其他url里的不一樣,table沒有閉合標簽
    # html5lib 可以自動補全table 的閉合標簽
    soup = BeautifulSoup(text,'html5lib')              
    conMidtab = soup.find("div",class_="conMidtab")
    tables = conMidtab.find_all('table')
    for table in tables:
        print("========")
        trs = table.find_all('tr')[2:]
        for index,tr in enumerate(trs):
            tds = tr.find_all('td')
            # 坑 2
            city_td = tds[0] # 拿到的是省份名
            if index == 0:
                city_td = tds[1]
            # 拿到tds[0]這個標簽下所有的文字    需要記住獲取的方法
            city = list(city_td.stripped_strings)[0] #拿到所有哦的城市名
            
            # 取最低氣溫標簽   取出每個table中的的tr里的最低氣溫
            temp_td = tds[-2]
            min_temp = list(temp_td.stripped_strings)[0]
            ALL_DATA.append({"city":city,"min_temp":int(min_temp)})
            print({"city":city,"min_temp":int(min_temp)})

def main():
    urls = [
         'http://www.weather.com.cn/textFC/hb.shtml',
         'http://www.weather.com.cn/textFC/db.shtml',
         'http://www.weather.com.cn/textFC/hd.shtml',
         'http://www.weather.com.cn/textFC/hz.shtml',
         'http://www.weather.com.cn/textFC/hn.shtml',
         'http://www.weather.com.cn/textFC/xb.shtml',
         'http://www.weather.com.cn/textFC/xn.shtml',
         'http://www.weather.com.cn/textFC/gat.shtml'
    ]
    for url in urls:
        parse_page(url)

    # 數據分析 根據最低氣溫排序
    ALL_DATA.sort(key=lambda data:data["min_temp"]) # 通過key指定根據什么排序
    # data 是ALL_DATA列表中每一行的字典數據, 根據data["min_temp"]返回值排序

    data = ALL_DATA[0:10] # 最低氣溫排前十的城市/區
    cities = list(map(lambda x:x['city'], data))
    temps = list(map(lambda x:x['min_temp'], data))
    chart = Bar()
    # 給這個圖取名
    chart.add("最低氣溫表",
              cities,
              temps,
              is_more_utils=True)
    chart.render('temperature.html')

if __name__ == '__main__':
    main()

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 中國氣象站點數據獲取的幾種方式調用“中國天氣網”官網數據接口顯示天氣預報初識python 之爬蟲：爬取中國天氣網數據中國天氣網城市代碼商圈數據獲取中國天氣網api（json格式）天氣預報接口api(中國天氣網) 天氣預報接口api(中國天氣網) 中國天氣網的中央氣象台實時數據接口雅虎等金融數據獲取