核密度圖可以看作是概率密度圖,其縱軸可以粗略看做是數據出現的次數,與橫軸圍成的面積是一.
法一:seaborn的kdeplot函數專門用於畫核密度估計圖.
參考:https://www.jianshu.com/p/844f66d00ac1
https://yq.aliyun.com/articles/682843

import pandas as pd import numpy as np import matplotlib.pylab as plt import os import seaborn as sns ## 有時候要翻牆才能下載 # df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv") # df.to_csv('../data/mpg_ggplot2.csv', index=False) df = pd.read_csv('../data/mpg_ggplot2.csv') print(df.info()) print(df.shape) # Draw Plot plt.figure(figsize=(16,10), dpi= 90) # dpi用於設置輸出figure中所有字體的大小 # 將cyl列等於4的cty篩選出來做圖 sns.kdeplot(df.loc[df['cyl'] == 4, "cty"], shade=True, color="g", label="Cyl=4", alpha=0.5) sns.kdeplot(df.loc[df['cyl'] == 5, "cty"], shade=True, color="deeppink", label="Cyl=5", alpha=.5) sns.kdeplot(df.loc[df['cyl'] == 6, "cty"], shade=True, color="dodgerblue", label="Cyl=6", alpha=.5) sns.kdeplot(df.loc[df['cyl'] == 8, "cty"], shade=True, color="orange", label="Cyl=8", alpha=.5) # Decoration plt.title('Density Plot of City Mileage by n_Cylinders', fontsize=22) plt.legend() plt.show()
displot()是將直方圖和核密度圖綜合,

import pandas as pd import numpy as np import matplotlib.pylab as plt import os import seaborn as sns ## 有時候要翻牆才能下載 # df = pd.read_csv("https://github.com/selva86/datasets/raw/master/mpg_ggplot2.csv") # df.to_csv('../data/mpg_ggplot2.csv', index=False)
df = pd.read_csv('../data/mpg_ggplot2.csv') # distplot圖是直方圖hist()和核密度圖kdeplot()圖的合體, # bins參數用於調節直方圖的數量 # 官網鏈接:http://seaborn.pydata.org/generated/seaborn.distplot.html # 參數解釋:http://www.sohu.com/a/158933070_718302
plt.figure(figsize=(16,10), dpi= 90) sns.distplot(df.loc[df['cyl'] == 4, "cty"], color="g", label="Cyl=4", bins = 100 ) sns.distplot(df.loc[df['cyl'] == 5, "cty"], color="deeppink", label="Cyl=5", bins= 10 ) plt.legend() plt.show()
給定一組連續值的數據,將它們分成若干小段,統計每個小段中數據的個數,並畫出它們的直方圖和擬合曲線.
法二:利用seaborn中的包可以快速實現,這里的擬合曲線默認不是正態曲線,而是更好的擬合了數據的分布情況,但通過參數fit可以設置擬合正態曲線.

import seaborn as sns import matplotlib.pyplot as plt import numpy as np sns.set(style="ticks") from sklearn import datasets from scipy.stats import norm iris = datasets.load_iris() # 載入鳶尾花數據集
x = iris.data[:,0] # 取narry中的第一列
sns.set_palette("hls") #設置所有圖的顏色,使用hls色彩空間 # sns.distplot( x,color="r",bins=100,kde=True,)# hist=False) # hist和kde參數默認都是True,分別用於控制是否展現直方圖和擬合的曲線圖 # fit可用於指定擬合正態分布,要導入from scipy.stats import norm
sns.distplot( x,bins=30, hist=True,kde_kws={'color': 'green', 'lw':3, 'label':'x'}, hist_kws={'color': 'red', 'alpha': 0.2}) plt.show()
官網教程:http://seaborn.pydata.org/generated/seaborn.distplot.html?highlight=distplot#seaborn.distplot
參考:https://www.jianshu.com/p/65395b00adbc
法三:利用round()函數保留小數點后一位或兩位,再groupby作圖,但效果遠不如第一種.

f_train['VAR00007'] = f_train['VAR00007'].apply( lambda x: round(x, 1)) f_train = f_train.groupby(['VAR00007'])['VAR00007'].agg(['count']).reset_index() f_train.sort_values(['VAR00007'], ) ydata = f_train['VAR00007'].tolist() x = f_train['count'].tolist() ydata.sort(reverse=False) plt.scatter( ydata, x) plt.show()