Python學習筆記:描述性統計describe


一、介紹

data.describe() 即可很方便的輸出數據的統計信息。

但還有更詳細的使用方法:

DataFrame.descirbe(percentiles=[0.1,0.2,0.5,0.75],
                  include=None,
                  exclude=None)

參數解釋:

percentiles -- 0-1之間的數字,以返回各自的百分位數
include -- 包含的數據類型
exclude -- 剔除的數據類型

二、實操

  • 默認統計量
import pandas as pd
import numpy as np

series = pd.Series(np.random.randn(100))
series.describe()
'''
count    100.000000  計數
mean      -0.049944  均值
std        0.967943  標准差
min       -2.692278  最小值
25%       -0.717809  25%分位數
50%       -0.061116  中位數
75%        0.682023  75%分位數
max        1.825730  最大值
dtype: float64
'''
  • percentiles參數
series.describe(percentiles=[0.05,0.25,0.3,0.7,0.8])
'''
count    100.000000
mean      -0.049944
std        0.967943
min       -2.692278
5%        -1.617615
25%       -0.717809
30%       -0.574646
50%       -0.061116
70%        0.543954
80%        0.776378
max        1.825730
dtype: float64
'''
  • include參數
df = pd.DataFrame({"class":["語文","語文","語文","語文","語文","數學","數學","數學","數學","數學"],
                  "name":["小明","小蘇","小周","小孫","小王","小明","小蘇","小周","小孫","小王"],
                  "score":[137,125,125,115,115,80,111,130,130,140]})
df

# 默認輸出數值型特征的統計量
df.describe()
df.descirbe(include=[np.number])
'''
            score
count   10.000000
mean   120.800000
std     17.203359
min     80.000000
25%    115.000000
50%    125.000000
75%    130.000000
max    140.000000
'''

# 計算離散型變量的統計特征
df.describe(include=['O'])
df.describe(include=[object])
'''
       class name
count     10   10      非空計數
unique     2    5      唯一值
top       數學   小孫   出現最頻繁
freq       5    2      頻次
'''

# all 輸出全部特征
df.describe(include='all')
'''
       class name       score
count     10   10   10.000000
unique     2    5         NaN
top       數學   小孫         NaN
freq       5    2         NaN
mean     NaN  NaN  120.800000
std      NaN  NaN   17.203359
min      NaN  NaN   80.000000
25%      NaN  NaN  115.000000
50%      NaN  NaN  125.000000
75%      NaN  NaN  130.000000
max      NaN  NaN  140.000000
'''
  • exclude參數
# 剔除統計類型
df.describe(exclude='O')
'''
            score
count   10.000000
mean   120.800000
std     17.203359
min     80.000000
25%    115.000000
50%    125.000000
75%    130.000000
max    140.000000
'''

參考鏈接:pandas.DataFrame.describe

參考鏈接:pandas 的describe函數的參數詳解

參考鏈接:【Python】pandas的describe參數詳解


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM