python爬蟲（二十一）中國天氣網最低氣溫爬蟲及可視化

本文轉載自查看原文 2020-03-10 21:50 964

網頁如圖所示

1、頁面分析

首先爬取華北地區

華北得url:http://www.weather.com.cn/textFC/hb.shtml

東北得url:http://www.weather.com.cn/textFC/db.shtml

依次很容易得到各個地區得url

一個城市得情況在一個table里

table里得第三個tr標簽開始為這個城市得天氣情況

2、華北城市數據爬取

import requests
from bs4 import  BeautifulSoup

def parse_page(url):
    headers={
        'User-Agent':"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36"
    }
    response=requests.get(url,headers=headers)
    text=response.content.decode('utf-8')
    soup=BeautifulSoup(text,'lxml')
    conMidtab=soup.find('div',class_='conMidtab')
    tables=conMidtab.find_all('table')
    for table in tables:
        trs=table.find_all('tr')[2:]
        for tr in trs:

            tds=tr.find_all('td')
            city_td=tds[0]
            city=list(city_td.stripped_strings)[0]
            temp_td=tds[-2]
            temp_min=list(temp_td.stripped_strings)[0]
            print({"city":city,"temp_min":temp_min})

def main():
    url="http://www.weather.com.cn/textFC/hb.shtml"
    parse_page(url)

if __name__ ==  '__main__':
    main()

華北地區的所有最低氣溫：

3、所有城市最低氣溫爬取

在測試時可以發現，除了華北和港澳台地區，其他地區的城市信息例如：

在華北地區的時候，第一個城市在第三個tr標簽，城市的名字在tr標簽下的第一個td標簽，但是在這幾個地區，第一個城市在第三個tr標簽，而城市的名字在tr標簽下的第二個td標簽

這時需要加上幾行代碼

 city_td=tds[0]
            # 如果是第0個tr標簽，城市就是第二個td標簽，其余得都選第0個td標簽
            if index==0:
                city_td = tds[1]

然后是港澳台，通過查看源代碼可以看出來，港澳台里面是不規范的html代碼，即有開始標簽沒有結束標簽，如果按照上面的方式寫，得到的是不正確的

這時就不能用lxml解析器，需要用html5lib解析器

soup=BeautifulSoup(text,'html5lib')

如果沒有安裝，通過命令 pip install html5lib 安裝

import requests
from bs4 import  BeautifulSoup

def parse_page(url):
    headers={
        'User-Agent':"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36"
    }
    response=requests.get(url,headers=headers)
    text=response.content.decode('utf-8')
    soup=BeautifulSoup(text,'html5lib')
    conMidtab=soup.find('div',class_='conMidtab')
    tables=conMidtab.find_all('table')
    for table in tables:
        trs=table.find_all('tr')[2:]
        for index,tr in enumerate(trs):

            tds=tr.find_all('td')

            city_td=tds[0]
            # 如果是第0個tr標簽，城市就是第二個td標簽，其余得都選第0個td標簽
            if index==0:
                city_td = tds[1]
            city=list(city_td.stripped_strings)[0]
            temp_td=tds[-2]
            temp_min=list(temp_td.stripped_strings)[0]
            print({"city":city,"temp_min":temp_min})


def main():
    urls={
        'http://www.weather.com.cn/textFC/hb.shtml',
        'http://www.weather.com.cn/textFC/db.shtml',
        'http://www.weather.com.cn/textFC/hd.shtml',
        'http://www.weather.com.cn/textFC/hz.shtml',
        'http://www.weather.com.cn/textFC/hn.shtml',
        'http://www.weather.com.cn/textFC/xb.shtml',
        'http://www.weather.com.cn/textFC/xn.shtml',
        'http://www.weather.com.cn/textFC/gat.shtml'
    }
    for url in urls:
     parse_page(url)

if __name__ ==  '__main__':
    main()

4、全國前十最低氣溫可視化

import requests
from bs4 import  BeautifulSoup
from pyecharts.charts import Bar

ALL_DATA=[]
def parse_page(url):
    headers={
        'User-Agent':"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36"
    }
    response=requests.get(url,headers=headers)
    text=response.content.decode('utf-8')
    soup=BeautifulSoup(text,'html5lib')
    conMidtab=soup.find('div',class_='conMidtab')
    tables=conMidtab.find_all('table')
    for table in tables:
        trs=table.find_all('tr')[2:]
        for index,tr in enumerate(trs):

            tds=tr.find_all('td')

            city_td=tds[0]
            # 如果是第0個tr標簽，城市就是第二個td標簽，其余得都選第0個td標簽
            if index==0:
                city_td = tds[1]
            city=list(city_td.stripped_strings)[0]
            temp_td=tds[-2]
            temp_min=list(temp_td.stripped_strings)[0]
            ALL_DATA.append({"city":city,"temp_min":int(temp_min)})
            # print({"city":city,"temp_min":int(temp_min)})


def main():
    urls={
        'http://www.weather.com.cn/textFC/hb.shtml',
        'http://www.weather.com.cn/textFC/db.shtml',
        'http://www.weather.com.cn/textFC/hd.shtml',
        'http://www.weather.com.cn/textFC/hz.shtml',
        'http://www.weather.com.cn/textFC/hn.shtml',
        'http://www.weather.com.cn/textFC/xb.shtml',
        'http://www.weather.com.cn/textFC/xn.shtml',
        'http://www.weather.com.cn/textFC/gat.shtml'
    }
    for url in urls:
     parse_page(url)

     ALL_DATA.sort(key=lambda data:data['temp_min'])
     data=ALL_DATA[0:10]
# 需要使用pyecharts
    cities=list(map(lambda x:x['city'],data))
    temps=list(map(lambda x:x['temp_min'],data))
    chart=Bar("中國氣溫排行榜")
    chart.add('',cities,temps)
    chart.render('temperature.html')

if __name__ ==  '__main__':
    main()

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 爬取中國天氣網所有地區的天氣，最高氣溫與最低氣溫情況(python是3.7版本的) Python網絡爬蟲入門實戰（爬取最近7天的天氣以及最高/最低氣溫） Python爬蟲(二十一)_Selenium與PhantomJS Python 爬蟲實現天氣查詢（可視化界面版） redis可視化界面的操作【二十一】 Python數據分析入門（二十一）：數據可視化之繪制箱線圖初識python 之爬蟲：爬取中國天氣網數據 python 網絡爬蟲入門---爬取天氣&可視化展示 python爬蟲爬取各個城市歷史天氣及數據可視化 python爬蟲實戰以及數據可視化

python爬蟲（二十一） 中國天氣網最低氣溫爬蟲及可視化

免責聲明！

python爬蟲（二十一）中國天氣網最低氣溫爬蟲及可視化