【Matplotlib】數據可視化實例分析


 

數據可視化實例分析

作者:白寧超

2017年7月19日09:09:07

摘要:數據可視化主要旨在借助於圖形化手段,清晰有效地傳達與溝通信息。但是,這並不就意味着數據可視化就一定因為要實現其功能用途而令人感到枯燥乏味,或者是為了看上去絢麗多彩而顯得極端復雜。為了有效地傳達思想概念,美學形式與功能需要齊頭並進,通過直觀地傳達關鍵的方面與特征,從而實現對於相當稀疏而又復雜的數據集的深入洞察。然而,設計人員往往並不能很好地把握設計與功能之間的平衡,從而創造出華而不實的數據可視化形式,無法達到其主要目的,也就是傳達與溝通信息。數據可視化與信息圖形、信息可視化、科學可視化以及統計圖形密切相關。當前,在研究、教學和開發領域,數據可視化乃是一個極為活躍而又關鍵的方面。“數據可視化”這條術語實現了成熟的科學可視化領域與較年輕的信息可視化領域的統一。(本文原創編著,轉載注明出處:數據可視化實例分析

1 折線圖的制作


1.1 需求描述

使用matplotlib繪制一個簡單的折線圖,在對其進行定制,以實現信息更加豐富的數據可視化,繪制(1,2,3,4,5)的平方折線圖。

1.2 源碼

#coding=utf-8
import matplotlib as mpl
import matplotlib.pyplot as plt
import pylab
# 解決中文亂碼問題
mpl.rcParams['font.sans-serif']=['SimHei']
mpl.rcParams['axes.unicode_minus']=False

# squares = [1,35,43,3,56,7]
input_values = [1,2,3,4,5]
squares = [1,4,9,16,25]
# 設置折線粗細
plt.plot(input_values,squares,linewidth=5)

# 設置標題和坐標軸
plt.title('平方數圖',fontsize=24)
plt.xlabel('',fontsize=14)
plt.ylabel('平方值',fontsize=14)

# 設置刻度大小
plt.tick_params(axis='both',labelsize=14)

plt.show()

1.3 生成結果

2 scatter()繪制散點圖


2.1 需求描述

 使用matplotlib繪制一個簡單的散列點圖,在對其進行定制,以實現信息更加豐富的數據可視化,繪制(1,2,3,4,5)的散點圖。 

2.2 源碼

#coding=utf-8
import matplotlib as mpl
import matplotlib.pyplot as plt
import pylab

# 解決中文亂碼問題
mpl.rcParams['font.sans-serif']=['SimHei']
mpl.rcParams['axes.unicode_minus']=False

# 設置散列點縱橫坐標值
x_values = [1,2,3,4,5]
y_values = [1,4,9,16,25]

# s設置散列點的大小,edgecolor='none'為刪除數據點的輪廓
plt.scatter(x_values,y_values,c='red',edgecolor='none',s=40)

# 設置標題和坐標軸
plt.title('平方數圖',fontsize=24)
plt.xlabel('',fontsize=14)
plt.ylabel('平方值',fontsize=14)

# 設置刻度大小
plt.tick_params(axis='both',which='major',labelsize=14)

# 自動保存圖表,參數2是剪裁掉多余空白區域
plt.savefig('squares_plot.png',bbox_inches='tight')

plt.show()

 

2.3 生成結果

 

2.4  需求改進

 使用matplotlib繪制一個簡單的散列點圖,在對其進行定制,以實現信息更加豐富的數據可視化,繪制1000個數的散點圖。並自動統計數據的平方,自定義坐標軸

2.5  源碼改進

#coding=utf-8
import matplotlib as mpl
import matplotlib.pyplot as plt
import pylab

# 解決中文亂碼問題
mpl.rcParams['font.sans-serif']=['SimHei']
mpl.rcParams['axes.unicode_minus']=False

# 設置散列點縱橫坐標值
# x_values = [1,2,3,4,5]
# y_values = [1,4,9,16,25]

# 自動計算數據
x_values = list(range(1,1001))
y_values = [x**2 for x in x_values]

# s設置散列點的大小,edgecolor='none'為刪除數據點的輪廓
# plt.scatter(x_values,y_values,c='red',edgecolor='none',s=40)

# 自定義顏色c=(0,0.8,0.8)紅綠藍
# plt.scatter(x_values,y_values,c=(0,0.8,0.8),edgecolor='none',s=40)

# 設置顏色隨y值變化而漸變
plt.scatter(x_values,y_values,c=y_values,cmap=plt.cm.Reds,edgecolor='none',s=40)

# 設置標題和坐標軸
plt.title('平方數圖',fontsize=24)
plt.xlabel('',fontsize=14)
plt.ylabel('平方值',fontsize=14)

#設置坐標軸的取值范圍
plt.axis([0,1100,0,1100000])

# 設置刻度大小
plt.tick_params(axis='both',which='major',labelsize=14)

# 自動保存圖表,參數2是剪裁掉多余空白區域
plt.savefig('squares_plot.png',bbox_inches='tight')

plt.show()

2.6 改進結果

 

3 隨機漫步圖


3.1 需求描述

隨機漫步是每次步行方向和步長都是隨機的,沒有明確的方向,結果由一系列隨機決策決定的。本實例中random_walk決策步行的左右上下方向和步長的隨機性,rw_visual是圖形化展示。

3.2 源碼

random_walk.py

from random import choice

class RandomWalk():
    '''一個生成隨機漫步數據的類'''
    def __init__(self,num_points=5000):
        '''初始化隨機漫步屬性'''
        self.num_points = num_points
        self.x_values = [0]
        self.y_values = [0]

    def fill_walk(self):
        '''計算隨機漫步包含的所有點'''
        while len(self.x_values)<self.num_points:
            # 決定前進方向及沿着該方向前進的距離
            x_direction = choice([1,-1])
            x_distance = choice([0,1,2,3,4])
            x_step = x_direction*x_distance

            y_direction = choice([1,-1])
            y_distance = choice([0,1,2,3,4])
            y_step = y_direction*y_distance

            # 拒絕原地踏步
            if x_step == 0 and y_step == 0:
                continue

            # 計算下一個點的x和y
            next_x = self.x_values[-1] + x_step
            next_y = self.y_values[-1] + y_step

            self.x_values.append(next_x)
            self.y_values.append(next_y)

rw_visual.py

#-*- coding: utf-8 -*-
#coding=utf-8
import matplotlib as mpl
import matplotlib.pyplot as plt
import pylab
from random_walk  import RandomWalk

# 解決中文亂碼問題
mpl.rcParams['font.sans-serif']=['SimHei']
mpl.rcParams['axes.unicode_minus']=False

# 創建RandomWalk實例
rw = RandomWalk()
rw.fill_walk()

plt.figure(figsize=(10,6))

point_numbers = list(range(rw.num_points))

# 隨着點數的增加漸變深紅色
plt.scatter(rw.x_values,rw.y_values,c=point_numbers,cmap=plt.cm.Reds,edgecolors='none',s=1)

# 設置起始點和終點顏色
plt.scatter(0,0,c='green',edgecolors='none',s=100)
plt.scatter(rw.x_values[-1],rw.y_values[-1],c='blue',edgecolors='none',s=100)

# 設置標題和縱橫坐標
plt.title('隨機漫步圖',fontsize=24)
plt.xlabel('左右步數',fontsize=14)
plt.ylabel('上下步數',fontsize=14)

# 隱藏坐標軸
plt.axes().get_xaxis().set_visible(False)
plt.axes().get_yaxis().set_visible(False)

plt.show()

3.3 生成結果

 

 

4 Pygal模擬擲骰子


4.1 需求描述

對擲骰子的結果進行分析,生成一個擲篩子的結果數據集並根據結果繪制出一個圖形。

4.2 源碼

 Die類

import random

class Die:
    """
    一個骰子類
    """
    def __init__(self, num_sides=6):
        self.num_sides = num_sides

    def roll(self):
        # 返回一個1和篩子面數之間的隨機數
        return random.randint(1, self.num_sides)

die_visual.py

#coding=utf-8
from die import Die
import pygal
import matplotlib as mpl
# 解決中文亂碼問題
mpl.rcParams['font.sans-serif']=['SimHei']
mpl.rcParams['axes.unicode_minus']=False


die1 = Die()
die2 = Die()
results = []
for roll_num in range(1000):
    result =die1.roll()+die2.roll()
    results.append(result)
# print(results)

# 分析結果
frequencies = []
max_result = die1.num_sides+die2.num_sides
for value in range(2,max_result+1):
    frequency = results.count(value)
    frequencies.append(frequency)
print(frequencies)

# 直方圖
hist = pygal.Bar()

hist.title = '骰子投擲1000次各面結果統計圖'
hist.x_labels =[x for x in range(2,max_result+1)]
hist.x_title ='結果'
hist.y_title = '結果分布'

hist.add('D6+D6',frequencies)
hist.render_to_file('die_visual.svg')
# hist.show()

4.3 生成結果

 

 

5 同時擲兩個骰子


5.1 需求描述

對同時擲兩個骰子的結果進行分析,生成一個擲篩子的結果數據集並根據結果繪制出一個圖形。

5.2 源碼

 

#conding=utf-8
from die import Die
import pygal
import matplotlib as mpl
# 解決中文亂碼問題
mpl.rcParams['font.sans-serif']=['SimHei']
mpl.rcParams['axes.unicode_minus']=False

die1 = Die()
die2 = Die(10)

results = []
for roll_num in range(5000):
    result = die1.roll() + die2.roll()
    results.append(result)
# print(results)


# 分析結果
frequencies = []
max_result = die1.num_sides+die2.num_sides
for value in range(2,max_result+1):
    frequency = results.count(value)
    frequencies.append(frequency)
# print(frequencies)


hist = pygal.Bar()
hist.title = 'D6 和 D10 骰子5000次投擲的結果直方圖'
# hist.x_labels=['2','3','4','5','6','7','8','9','10','11','12','13','14','15','16']
hist.x_labels=[x for x in range(2,max_result+1)]
hist.x_title = 'Result'
hist.y_title ='Frequency of Result'

hist.add('D6 + D10',frequencies)
hist.render_to_file('dice_visual.svg')

5. 生成結果

 

6 繪制氣溫圖表


6.1 需求描述

對csv文件進行處理,提取並讀取天氣數據,繪制氣溫表,在圖表中添加日期並繪制最高氣溫和最低氣溫的折線圖,並對氣溫區域進行着色。

6.2 源碼

csv文件中2014年7月部分數據信息

AKDT,Max TemperatureF,Mean TemperatureF,Min TemperatureF,Max Dew PointF,MeanDew PointF,Min DewpointF,Max Humidity, Mean Humidity, Min Humidity, Max Sea Level PressureIn, Mean Sea Level PressureIn, Min Sea Level PressureIn, Max VisibilityMiles, Mean VisibilityMiles, Min VisibilityMiles, Max Wind SpeedMPH, Mean Wind SpeedMPH, Max Gust SpeedMPH,PrecipitationIn, CloudCover, Events, WindDirDegrees
2014-7-1,64,56,50,53,51,48,96,83,58,30.19,30.00,29.79,10,10,10,7,4,,0.00,7,,337
2014-7-2,71,62,55,55,52,46,96,80,51,29.81,29.75,29.66,10,9,2,13,5,,0.14,7,Rain,327
2014-7-3,64,58,53,55,53,51,97,85,72,29.88,29.86,29.81,10,10,8,15,4,,0.01,6,,258
2014-7-4,59,56,52,52,51,50,96,88,75,29.91,29.89,29.87,10,9,2,9,2,,0.07,7,Rain,255
2014-7-5,69,59,50,52,50,46,96,72,49,29.88,29.82,29.79,10,10,10,13,5,,0.00,6,,110
2014-7-6,62,58,55,51,50,46,80,71,58,30.13,30.07,29.89,10,10,10,20,10,29,0.00,6,Rain,213
2014-7-7,61,57,55,56,53,51,96,87,75,30.10,30.07,30.05,10,9,4,16,4,25,0.14,8,Rain,211
2014-7-8,55,54,53,54,53,51,100,94,86,30.10,30.06,30.04,10,6,2,12,5,23,0.84,8,Rain,159
2014-7-9,57,55,53,56,54,52,100,96,83,30.24,30.18,30.11,10,7,2,9,5,,0.13,8,Rain,201
2014-7-10,61,56,53,53,52,51,100,90,75,30.23,30.17,30.03,10,8,2,8,3,,0.03,8,Rain,215
2014-7-11,57,56,54,56,54,51,100,94,84,30.02,30.00,29.98,10,5,2,12,5,,1.28,8,Rain,250
2014-7-12,59,56,55,58,56,55,100,97,93,30.18,30.06,29.99,10,6,2,15,7,26,0.32,8,Rain,275
2014-7-13,57,56,55,58,56,55,100,98,94,30.25,30.22,30.18,10,5,1,8,4,,0.29,8,Rain,291
2014-7-14,61,58,55,58,56,51,100,94,83,30.24,30.23,30.22,10,7,0,16,4,,0.01,8,Fog,307
2014-7-15,64,58,55,53,51,48,93,78,64,30.27,30.25,30.24,10,10,10,17,12,,0.00,6,,318
2014-7-16,61,56,52,51,49,47,89,76,64,30.27,30.23,30.16,10,10,10,15,6,,0.00,6,,294
2014-7-17,59,55,51,52,50,48,93,84,75,30.16,30.04,29.82,10,10,6,9,3,,0.11,7,Rain,232
2014-7-18,63,56,51,54,52,50,100,84,67,29.79,29.69,29.65,10,10,7,10,5,,0.05,6,Rain,299
2014-7-19,60,57,54,55,53,51,97,88,75,29.91,29.82,29.68,10,9,2,9,2,,0.00,8,,292
2014-7-20,57,55,52,54,52,50,94,89,77,29.92,29.87,29.78,10,8,2,13,4,,0.31,8,Rain,155
2014-7-21,69,60,52,53,51,50,97,77,52,29.99,29.88,29.78,10,10,10,13,4,,0.00,5,,297
2014-7-22,63,59,55,56,54,52,90,84,77,30.11,30.04,29.99,10,10,10,9,3,,0.00,6,Rain,240
2014-7-23,62,58,55,54,52,50,87,80,72,30.10,30.03,29.96,10,10,10,8,3,,0.00,7,,230
2014-7-24,59,57,54,54,52,51,94,84,78,29.95,29.91,29.89,10,9,3,17,4,28,0.06,8,Rain,207
2014-7-25,57,55,53,55,53,51,100,92,81,29.91,29.87,29.83,10,8,2,13,3,,0.53,8,Rain,141
2014-7-26,57,55,53,57,55,54,100,96,93,29.96,29.91,29.87,10,8,1,15,5,24,0.57,8,Rain,216
2014-7-27,61,58,55,55,54,53,100,92,78,30.10,30.05,29.97,10,9,2,13,5,,0.30,8,Rain,213
2014-7-28,59,56,53,57,54,51,97,94,90,30.06,30.00,29.96,10,8,2,9,3,,0.61,8,Rain,261
2014-7-29,61,56,51,54,52,49,96,89,75,30.13,30.02,29.95,10,9,3,14,4,,0.25,6,Rain,153
2014-7-30,61,57,54,55,53,52,97,88,78,30.31,30.23,30.14,10,10,8,8,4,,0.08,7,Rain,160
2014-7-31,66,58,50,55,52,49,100,86,65,30.31,30.29,30.26,10,9,3,10,4,,0.00,3,,217
View Code

 

highs_lows.py文件信息

import csv
from datetime import datetime
from matplotlib import pyplot as plt
import matplotlib as mpl

# 解決中文亂碼問題
mpl.rcParams['font.sans-serif']=['SimHei']
mpl.rcParams['axes.unicode_minus']=False

# Get dates, high, and low temperatures from file.
filename = 'death_valley_2014.csv'
with open(filename) as f:
    reader = csv.reader(f)
    header_row = next(reader)
    # print(header_row)

    # for index,column_header in enumerate(header_row):
    #     print(index,column_header)

    dates, highs,lows = [],[], []
    for row in reader:
        try:
            current_date = datetime.strptime(row[0], "%Y-%m-%d")
            high = int(row[1])
            low = int(row[3])
        except ValueError: # 處理
            print(current_date, 'missing data')
        else:
            dates.append(current_date)
            highs.append(high)
            lows.append(low)

# 匯制數據圖形
fig = plt.figure(dpi=120,figsize=(10,6))
plt.plot(dates,highs,c='red',alpha=0.5)# alpha指定透明度
plt.plot(dates,lows,c='blue',alpha=0.5)
plt.fill_between(dates,highs,lows,facecolor='orange',alpha=0.1)#接收一個x值系列和y值系列,給圖表區域着色

#設置圖形格式
plt.title('2014年加利福尼亞死亡谷日氣溫最高最低圖',fontsize=24)
plt.xlabel('日(D)',fontsize=16)
fig.autofmt_xdate() # 繪制斜體日期標簽
plt.ylabel('溫度(F)',fontsize=16)
plt.tick_params(axis='both',which='major',labelsize=16)
# plt.axis([0,31,54,72]) # 自定義數軸起始刻度
plt.savefig('highs_lows.png',bbox_inches='tight')

plt.show()

6.3 生成結果

  

7 制作世界人口地圖:JSON格式


7.1 需求描述

下載json格式的人口數據,並使用json模塊來處理。

7.2 源碼

 json數據population_data.json部分信息

countries.py

from pygal.maps.world import COUNTRIES

for country_code in sorted(COUNTRIES.keys()):
    print(country_code, COUNTRIES[country_code])

 countries_codes.py

from pygal.maps.world import COUNTRIES
def get_country_code(country_name):
    """Return the Pygal 2-digit country code for the given country."""
    for code, name in COUNTRIES.items():
        if name == country_name:
            return code
    # If the country wasn't found, return None.
    return

print(get_country_code('Thailand'))
# print(get_country_code('Andorra'))

americas.py

import pygal

wm =pygal.maps.world.World()
wm.title = 'North, Central, and South America'

wm.add('North America', ['ca', 'mx', 'us'])
wm.add('Central America', ['bz', 'cr', 'gt', 'hn', 'ni', 'pa', 'sv'])
wm.add('South America', ['ar', 'bo', 'br', 'cl', 'co', 'ec', 'gf',
    'gy', 'pe', 'py', 'sr', 'uy', 've'])
wm.add('Asia', ['cn', 'jp', 'th'])
wm.render_to_file('americas.svg')

 

world_population.py

#conding = utf-8
import json
from matplotlib import pyplot as plt
import matplotlib as mpl
from country_codes import get_country_code
import pygal
from pygal.style import RotateStyle
from pygal.style import LightColorizedStyle
# 解決中文亂碼問題
mpl.rcParams['font.sans-serif']=['SimHei']
mpl.rcParams['axes.unicode_minus']=False

# 加載json數據
filename='population_data.json'
with open(filename) as f:
    pop_data = json.load(f)
    # print(pop_data[1])


# 創建一個包含人口的字典
cc_populations={}
# cc1_populations={}

# 打印每個國家2010年的人口數量
for pop_dict in pop_data:
    if pop_dict['Year'] == '2010':
        country_name = pop_dict['Country Name']
        population = int(float(pop_dict['Value'])) # 字符串數值轉化為整數
        # print(country_name + ":" + str(population))
        code = get_country_code(country_name)
        if code:
            cc_populations[code] = population
    # elif pop_dict['Year'] == '2009':
    #     country_name = pop_dict['Country Name']
    #     population = int(float(pop_dict['Value'])) # 字符串數值轉化為整數
    #     # print(country_name + ":" + str(population))
    #     code = get_country_code(country_name)
    #     if code:
    #         cc1_populations[code] = population

cc_pops_1,cc_pops_2,cc_pops_3={},{},{}
for cc,pop in cc_populations.items():
    if pop <10000000:
        cc_pops_1[cc]=pop
    elif pop<1000000000:
        cc_pops_2[cc]=pop
    else:
        cc_pops_3[cc]=pop

# print(len(cc_pops_1),len(cc_pops_2),len(cc_pops_3))

wm_style = RotateStyle('#336699',base_style=LightColorizedStyle)
wm =pygal.maps.world.World(style=wm_style)
wm.title = '2010年世界各國人口統計圖'
wm.add('0-10m', cc_pops_1)
wm.add('10m-1bm',cc_pops_2)
wm.add('>1bm',cc_pops_3)
# wm.add('2009', cc1_populations)

wm.render_to_file('world_populations.svg')

 

7.3 生成結果

countries.py

world_population.py

 

8 Pygal可視化github倉庫


8.1 需求描述

調用web API對GitHub數據倉庫進行可視化展示:https://api.github.com/search/repositories?q=language:python&sort=stars

8.2 源碼

python_repos.py

# coding=utf-8
import requests
import pygal
from pygal.style import LightColorizedStyle as LCS, LightenStyle as LS


# Make an API call, and store the response.
url = 'https://api.github.com/search/repositories?q=language:python&sort=stars'
r = requests.get(url)
print("Status code:", r.status_code) # 查看請求是否成功,200表示成功

response_dict = r.json()
# print(response_dict.keys())
print("Total repositories:", response_dict['total_count'])

# Explore information about the repositories.
repo_dicts = response_dict['items']
print("Repositories returned:",len(repo_dicts))

# 查看項目信息
# repo_dict =repo_dicts[0]
# print('\n\neach repository:')
# for repo_dict in repo_dicts:
#     print("\nName:",repo_dict['name'])
#     print("Owner:",repo_dict['owner']['login'])
#     print("Stars:",repo_dict['stargazers_count'])
#     print("Repository:",repo_dict['html_url'])
#     print("Description:",repo_dict['description'])
# 查看每個項目的鍵
# print('\nKeys:',len(repo_dict))
# for key in sorted(repo_dict.keys()):
#     print(key)

names, plot_dicts = [], []
for repo_dict in repo_dicts:
    names.append(repo_dict['name'])
    plot_dicts.append(repo_dict['stargazers_count'])

# 可視化
my_style = LS('#333366', base_style=LCS)

my_config = pygal.Config() # Pygal類Config實例化
my_config.x_label_rotation = 45 # x軸標簽旋轉45度
my_config.show_legend = False # show_legend隱藏圖例
my_config.title_font_size = 24 # 設置圖標標題主標簽副標簽的字體大小
my_config.label_font_size = 14
my_config.major_label_font_size = 18
my_config.truncate_label = 15 # 較長的項目名稱縮短15字符
my_config.show_y_guides = False # 隱藏圖表中的水平線
my_config.width = 1000 # 自定義圖表的寬度

chart = pygal.Bar(my_config, style=my_style)
chart.title = 'Most-Starred Python Projects on GitHub'
chart.x_labels = names
chart.add('', plot_dicts)
chart.render_to_file('python_repos.svg')

 

8.3 生成結果

9 參考文獻


matplotlib官網

天氣數據官網

3 實驗數據下載

google charts

5 Plotly

6 Jpgraph

 

 



免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM