Seaborn中文教程

本文轉載自查看原文 2020-02-19 10:05 739 seaborn/ Python

章節概要

Seaborn介紹：
安裝Seaborn
加載庫和數據文件
Seaborn的繪圖功能
用Matplotlib定制
Pandas的作用
Seaborn 主題
調色盤
圖形重疊
融合數據
迷你畫廊

安裝Seaborn

首先確定你的電腦已安裝以下應用

Python 2.7+ or Python 3
Pandas
Matplotlib
Seaborn
Jupyter Notebook(可選)

打開Jupyter Notebook, 過幾秒鍾會彈出網頁窗口Home。

點擊右側的New，新建一個Notebook，彈出一個新的網頁窗口，點擊上方可命名文件。

Seaborn介紹：

Seaborn屬於Matplotlib的一個高級接口，為我們進行數據的可視化分析提供了極大的方便。

加載庫和數據文件

加載pandas、matplotlib、seaborn。

# coding: utf-8

#加載pandas
import pandas as pd

#加載matplotlib
from matplotlib import pyplot as plt

#在notebook中顯示數據點
%matplotlib inline

#加載seaborn
import seaborn as sb

這里提供了一個數據文件，下載鏈接為
Pokemon.csv

用pandas讀取數據文件，並顯示前五行。

#用pandas讀取Pokemon.csv
df = pd.read_csv("f:/Pokemon.csv", encoding = "unicode_escape")

#讀取前五行，編譯后的結果為一個列表。
df.head()

	#	Name	Type 1	Type 2	Total	HP	Attack	Defense	Sp. Atk	Sp. Def	Speed	Stage	Legendary
0	1	Bulbasaur	Grass	Poison	318	45	49	49	65	65	45	1	False
1	2	Ivysaur	Grass	Poison	405	60	62	63	80	80	60	2	False
2	3	Venusaur	Grass	Poison	525	80	82	83	100	100	80	3	False
3	4	Charmander	Fire	NaN	309	39	52	43	60	50	65	1	False
4	5	Charmeleon	Fire	NaN	405	58	64	58	80	65	80	2	False

#繪制散點圖
sb.lmplot(x = 'Attack', y = 'Defense', data = df)

D:\Function\Anaconda3\lib\site-packages\scipy\stats\stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval





<seaborn.axisgrid.FacetGrid at 0x2ea94d27c50>

png

Seaborn的繪圖功能

Seaborn最大的優點之一在於其種類繁多的繪圖功能，下面我們利用lmplot()函數，用一行代碼來繪制散點圖。我們希望x軸顯示Attack數據，y軸顯示Defense數據，則可以編寫代碼。

事實上seaborn並沒有專門用來繪制散點圖的功能，實際上我們用它來擬合和繪制回歸線。
幸運的是，我們可以通過設置函數的參數來得到我們想要的散點圖。用fit_reg = False 移去回歸線，用hug參數來用不同顏色顯示Pokemon進化階段的信息。

#移去回歸線，用不同顏色來表示pokemon的進化階段，即刻得到散點圖：

sb.lmplot(x = 'Attack', y = 'Defense', data = df,
         fit_reg = False,
         hue = 'Stage')

<seaborn.axisgrid.FacetGrid at 0x2ea950e4278>

png

從散點圖可以看出，所有的數據點都分布在數軸的正半軸，然而散點圖的數軸從負數開始的，我們可以對它進行改進。

用Matplotlib定制

雖然Seaborn是Matplotlib的一個高級接口，但是我們有時候也需要用到Matplotlib。其中包括設置數軸的范圍。我們利用Matplotlib的ylim()和xlim()函數來設置數軸的范圍。

#設置數軸范圍


plt.gca().set(xlim = (0, None), ylim = (0, None),
             xlabel='Attack', ylabel='Defense')

[(0, 1.0), Text(0, 0.5, 'Defense'), (0, 1.0), Text(0.5, 0, 'Attack')]

png

Pandas的作用

盡管這是一個Seaborn教程，pandas依然在實際應用中起到了十分重要的作用。下面我們根據Pokemon的攻擊數據來繪制箱形圖

sb.boxplot(data = df)
#得到的箱形圖：

<matplotlib.axes._subplots.AxesSubplot at 0x2ea950c99b0>

png

很好，這是一個良好的開端，但是我們可以移除不需要的幾列數據。

移除掉Total，因為我們有獨立的統計數據。
移除掉Stage跟Legendary，因為它們不是攻擊統計數據。
我們可以創建一個新的數據集stats_df，滿足我們上述的要求。

#創建新數據集
stats_df = df.drop(['Total', 'Stage', 'Legendary'], axis = 1)

#Boxplot
sb.boxplot(data = stats_df)

<matplotlib.axes._subplots.AxesSubplot at 0x2ea962be358>

png

得到了一個改進了的箱形圖。

Seaborn 主題

Seaborn的另一個好處就是其恰到好處、開箱即用的風格主題。其默認的主題為“darkgrid”
下一步，我們把主題改為“whitegrid”來創建一個小提琴圖

小提琴圖常常作為箱形圖的替代
小提琴圖通過小提琴的厚度展示了數據的分布，而不僅僅是總結數據。
根據Pokemon的主要類型，我們可以將Attack數據的分布可視化。

#設置主題
sb.set_style('whitegrid')

#violin plot
sb.violinplot(x = 'Type 1', y = 'Attack', data = df)

<matplotlib.axes._subplots.AxesSubplot at 0x2ea96343828>

png

可以得到小提琴圖，x軸顯示的是Pokemon的Type1，y軸顯示的是不同Pokemon的攻擊數值。

調色盤

Seaborn可以根據我們的需求，來設置顏色。我們可以創建一個python命令列表，用顏色的十六進制數值來設置。數值可以在Bulbapedia中尋找。

#創建顏色列表
pkmn_type_colors = ['#78C850',
                    '#F08030',
                    '#6890F0',
                    '#A8B820',
                    '#A8A878',
                    '#A040A0',
                    '#F8D030',
                    '#E0C068',
                    '#EE99AC',
                    '#C03028',
                    '#F85888',
                    '#B8A038',
                    '#705898',
                    '#98D8D8',
                    '#7038F8'
                   ]

#導入小提琴圖中
sb.violinplot(x = 'Type 1', y = 'Attack', data = df,
             palette = pkmn_type_colors)

D:\Function\Anaconda3\lib\site-packages\scipy\stats\stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval





<matplotlib.axes._subplots.AxesSubplot at 0x2ea94d24470>

png

為了能夠讓數據文件中的151個Pokemon都能夠在圖中簡單展示，我們可以用群集圖 swarm plot達到這一目的。

sb.swarmplot(x = 'Type 1', y = 'Attack', data = df,
            palette = pkmn_type_colors)

<matplotlib.axes._subplots.AxesSubplot at 0x2ea96d5bda0>

png

圖形重疊

我們已經繪制出了小提琴圖和群集圖，Seaborn允許我們將這兩張圖整合在一張圖上，步驟如下：

首先，我們用Matplotlib設置圖形比例。
然后，我們繪制小提琴圖，用inner = None將小提琴中間的木桿移去。
接着，我們繪制群集圖，並將數據點的顏色變為黑色。
最后，我們可以用Matplotlib設置一個標題。

#設置圖形比例
plt.figure(figsize = (10, 6))

#創建violinplot
sb.violinplot(x = 'Type 1', y = 'Attack', data = df,
              inner = None,
             palette = pkmn_type_colors)

#創建swarmplot
sb.swarmplot(x = 'Type 1', y = 'Attack', data = df,
             color = 'k',
            palette = pkmn_type_colors)

#設置標題
plt.title('Attack by Type')

#其中alpha為透明度。 編譯后可以得到如下圖形。

Text(0.5, 1.0, 'Attack by Type')

png

現在我們可以清晰的看到不同Pokemon的攻擊值了。那么我們怎么看其他的數值呢？

融合數據

為了展現其他的數據，我們當然可以重復以上的步驟，繪制多張圖。但是我們同樣也可以在一張圖上表示所有的數據，這時候pandas就派上用場了。
我們可以利用pandas的melt()函數來將一些數據進行融合，這樣就可以在不同Pokemon之間直接進行比對，melt()需要導入3個參數，分別為：

需要融合的數據列表
需要保留的ID變量，其他變量將會被Pandas融合。
融合而成的新變量的名字。

#數據融合
melted_df = pd.melt(stats_df,
                   id_vars = ['Name', 'Type 1', 'Type 2'],
                   var_name = 'Stat')

#前五行
melted_df.head()

	Name	Type 1	Type 2	Stat	value
0	Bulbasaur	Grass	Poison	#	1
1	Ivysaur	Grass	Poison	#	2
2	Venusaur	Grass	Poison	#	3
3	Charmander	Fire	NaN	#	4
4	Charmeleon	Fire	NaN	#	5

我們為已經融合的數據列表melted_df繪制群集圖。

#數據融合
melted_df = pd.melt(stats_df,
                   id_vars = ['Name', 'Type 1', 'Type 2'],
                   var_name = 'Stat')

#前五行
melted_df.head()

#繪制群集圖
sb.swarmplot(x = 'Stat', y = 'value', data = melted_df,
            hue = 'Type 1')

#就可以得到如下的群集圖。x軸為Stat中融合的六個變量，y軸為Stat的值，不同顏色代表不同的Pokemon Type 1。

<matplotlib.axes._subplots.AxesSubplot at 0x2ea9887da58>

png

這張圖表有一些細節需要完善：

擴大圖表。
使用split = True 來分隔色調。
使用我們自定義的顏色。
調整y軸的范圍
將圖例放在右側。

#擴大圖表
plt.figure(figsize = (10, 8))

#繪制群集圖，使用split = True 來分割，使用自定義的顏色
sb.swarmplot(x = 'Stat', y = 'value', data = melted_df,
            hue = "Type 1",
            split = True,
            palette = pkmn_type_colors)

#調整Y軸的范圍
plt.ylim(0,260)

#將圖例放在右側
plt.legend(bbox_to_anchor = (1,1), loc = 2)

D:\Function\Anaconda3\lib\site-packages\seaborn\categorical.py:2974: UserWarning: The `split` parameter has been renamed to `dodge`.
  warnings.warn(msg, UserWarning)





<matplotlib.legend.Legend at 0x2ea98a7f0b8>

png

即可得到一個已經細節完善后的圖表。

迷你畫廊

Heatmap

Heatmap可以幫助可視化矩陣狀的數據。

#計算相關性
corr = stats_df.corr()

#Hteatmap
sb.heatmap(corr)

<matplotlib.axes._subplots.AxesSubplot at 0x2ea98dac4e0>

png

Histogram

Histogram能夠繪制變量的數值分布。

#繪制直方圖
sb.distplot(df.Attack)

<matplotlib.axes._subplots.AxesSubplot at 0x2ea988d2da0>

png

Bar Plot

條形圖可以幫助分類變量的可視化。

#繪制條形圖abs
sb.countplot(x = 'Type 1', data = df, palette = pkmn_type_colors)

#傾斜x軸的標簽
plt.xticks(rotation = -45)

(array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14]),
 <a list of 15 Text xticklabel objects>)

png

Factor plots

Factor plots能夠根據類別分離圖表。

#分離圖表
g = sb.factorplot(x = 'Type 1',
                  y = 'Attack',
                  data = df,
                  hue = 'Stage',  #用不同的顏色表示Stage
                  col = 'Stage',  #根據Stage來分離圖表
                  kind = 'swarm', #創建群集圖
                 )

#傾斜x軸的標簽
plt.xticks(rotation = -45)

D:\Function\Anaconda3\lib\site-packages\seaborn\categorical.py:3666: UserWarning: The `factorplot` function has been renamed to `catplot`. The original name will be removed in a future release. Please update your code. Note that the default `kind` in `factorplot` (`'point'`) has changed `'strip'` in `catplot`.
  warnings.warn(msg)





(array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14]),
 <a list of 15 Text xticklabel objects>)

png

可以得到根據Stage分離的三個圖表，分別用不同顏色的點表示不同的Pokemon。

Density Plot

密度圖顯示的是兩個變量之間的分布。
曲線越密集的地方說明兩個變量的關系越近，越稀疏的地方說明關系越遠。

#創建密度圖
sb.kdeplot(df.Attack, df.Defense)

D:\Function\Anaconda3\lib\site-packages\scipy\stats\stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval





<matplotlib.axes._subplots.AxesSubplot at 0x2ea9a8d6f60>

png

Joint Distribution Plot

聯合分布圖將散點圖和直方圖的信息結合起來，提供雙變量分布的詳細信息。

#創建聯合分布圖
sb.jointplot(x = 'Attack', y = 'Defense', data = df)

<seaborn.axisgrid.JointGrid at 0x2ea9ab74da0>

png

這里只是介紹了Seaborn常用的繪圖功能，還有更強大的功能Example gallery需要我們去學習，去探索。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Pythonista中文教程 jsplumb 中文教程 PhpStorm中文教程 WDL中文教程 Bottle中文教程：(一)安裝 Python Kivy 中文教程：安裝（Windows） eeglab中文教程系列匯總 Swift中文教程(五)--對象和類 SOAPUI中文教程---使用斷言 mediapipe 中文教程 android開發