核密度圖(直方圖的擬合曲線)


核密度圖可以看作是概率密度圖,其縱軸可以粗略看做是數據出現的次數,與橫軸圍成的面積是一.

法一:seaborn的kdeplot函數專門用於畫核密度估計圖.

參考:https://www.jianshu.com/p/844f66d00ac1

https://yq.aliyun.com/articles/682843

import pandas as pd
import numpy as np
import matplotlib.pylab as plt
import os
import seaborn as sns

## 有時候要翻牆才能下載
# df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv")
# df.to_csv('../data/mpg_ggplot2.csv', index=False)
df = pd.read_csv('../data/mpg_ggplot2.csv')

print(df.info())
print(df.shape)
# Draw Plot
plt.figure(figsize=(16,10), dpi= 90)  # dpi用於設置輸出figure中所有字體的大小
# 將cyl列等於4的cty篩選出來做圖
sns.kdeplot(df.loc[df['cyl'] == 4, "cty"], shade=True, color="g", label="Cyl=4", alpha=0.5)
sns.kdeplot(df.loc[df['cyl'] == 5, "cty"], shade=True, color="deeppink", label="Cyl=5", alpha=.5)
sns.kdeplot(df.loc[df['cyl'] == 6, "cty"], shade=True, color="dodgerblue", label="Cyl=6", alpha=.5)
sns.kdeplot(df.loc[df['cyl'] == 8, "cty"], shade=True, color="orange", label="Cyl=8", alpha=.5)
# Decoration
plt.title('Density Plot of City Mileage by n_Cylinders', fontsize=22)
plt.legend()
plt.show()
View Code

 displot()是將直方圖和核密度圖綜合,

import pandas as pd import numpy as np import matplotlib.pylab as plt import os import seaborn as sns ## 有時候要翻牆才能下載 # df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv") # df.to_csv('../data/mpg_ggplot2.csv', index=False)
df = pd.read_csv('../data/mpg_ggplot2.csv') # distplot圖是直方圖hist()和核密度圖kdeplot()圖的合體, # bins參數用於調節直方圖的數量 # 官網鏈接:http://seaborn.pydata.org/generated/seaborn.distplot.html # 參數解釋:http://www.sohu.com/a/158933070_718302
plt.figure(figsize=(16,10), dpi= 90) sns.distplot(df.loc[df['cyl'] == 4, "cty"],  color="g", label="Cyl=4", bins = 100 ) sns.distplot(df.loc[df['cyl'] == 5, "cty"],  color="deeppink", label="Cyl=5", bins= 10 ) plt.legend() plt.show()
View Code

 

給定一組連續值的數據,將它們分成若干小段,統計每個小段中數據的個數,並畫出它們的直方圖和擬合曲線.

法二:利用seaborn中的包可以快速實現,這里的擬合曲線默認不是正態曲線,而是更好的擬合了數據的分布情況,但通過參數fit可以設置擬合正態曲線.

import seaborn as sns import matplotlib.pyplot as plt import numpy as np sns.set(style="ticks") from sklearn import datasets from scipy.stats import norm iris = datasets.load_iris()   # 載入鳶尾花數據集
x = iris.data[:,0]            # 取narry中的第一列
sns.set_palette("hls")        #設置所有圖的顏色,使用hls色彩空間 # sns.distplot( x,color="r",bins=100,kde=True,)# hist=False) # hist和kde參數默認都是True,分別用於控制是否展現直方圖和擬合的曲線圖 # fit可用於指定擬合正態分布,要導入from scipy.stats import norm
sns.distplot( x,bins=30, hist=True,kde_kws={'color': 'green', 'lw':3, 'label':'x'}, hist_kws={'color': 'red', 'alpha': 0.2}) plt.show()
View Code

官網教程:http://seaborn.pydata.org/generated/seaborn.distplot.html?highlight=distplot#seaborn.distplot

參考:https://www.jianshu.com/p/65395b00adbc

法三:利用round()函數保留小數點后一位或兩位,再groupby作圖,但效果遠不如第一種.

f_train['VAR00007'] = f_train['VAR00007'].apply( lambda x: round(x, 1)) f_train = f_train.groupby(['VAR00007'])['VAR00007'].agg(['count']).reset_index() f_train.sort_values(['VAR00007'], ) ydata = f_train['VAR00007'].tolist() x = f_train['count'].tolist() ydata.sort(reverse=False) plt.scatter( ydata, x) plt.show()
View Code

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM