Python學習筆記：描述性統計describe

本文轉載自查看原文 2021-11-16 13:04 4115 Python

一、介紹

data.describe() 即可很方便的輸出數據的統計信息。

但還有更詳細的使用方法：

DataFrame.descirbe(percentiles=[0.1,0.2,0.5,0.75],
                  include=None,
                  exclude=None)

參數解釋：

percentiles -- 0-1之間的數字，以返回各自的百分位數
include -- 包含的數據類型
exclude -- 剔除的數據類型

二、實操

默認統計量

import pandas as pd
import numpy as np

series = pd.Series(np.random.randn(100))
series.describe()
'''
count    100.000000  計數
mean      -0.049944  均值
std        0.967943  標准差
min       -2.692278  最小值
25%       -0.717809  25%分位數
50%       -0.061116  中位數
75%        0.682023  75%分位數
max        1.825730  最大值
dtype: float64
'''

percentiles參數

series.describe(percentiles=[0.05,0.25,0.3,0.7,0.8])
'''
count    100.000000
mean      -0.049944
std        0.967943
min       -2.692278
5%        -1.617615
25%       -0.717809
30%       -0.574646
50%       -0.061116
70%        0.543954
80%        0.776378
max        1.825730
dtype: float64
'''

include參數

df = pd.DataFrame({"class":["語文","語文","語文","語文","語文","數學","數學","數學","數學","數學"],
                  "name":["小明","小蘇","小周","小孫","小王","小明","小蘇","小周","小孫","小王"],
                  "score":[137,125,125,115,115,80,111,130,130,140]})
df

# 默認輸出數值型特征的統計量
df.describe()
df.descirbe(include=[np.number])
'''
            score
count   10.000000
mean   120.800000
std     17.203359
min     80.000000
25%    115.000000
50%    125.000000
75%    130.000000
max    140.000000
'''

# 計算離散型變量的統計特征
df.describe(include=['O'])
df.describe(include=[object])
'''
       class name
count     10   10      非空計數
unique     2    5      唯一值
top       數學   小孫   出現最頻繁
freq       5    2      頻次
'''

# all 輸出全部特征
df.describe(include='all')
'''
       class name       score
count     10   10   10.000000
unique     2    5         NaN
top       數學   小孫         NaN
freq       5    2         NaN
mean     NaN  NaN  120.800000
std      NaN  NaN   17.203359
min      NaN  NaN   80.000000
25%      NaN  NaN  115.000000
50%      NaN  NaN  125.000000
75%      NaN  NaN  130.000000
max      NaN  NaN  140.000000
'''

exclude參數

# 剔除統計類型
df.describe(exclude='O')
'''
            score
count   10.000000
mean   120.800000
std     17.203359
min     80.000000
25%    115.000000
50%    125.000000
75%    130.000000
max    140.000000
'''

參考鏈接：pandas.DataFrame.describe

參考鏈接：pandas 的describe函數的參數詳解

參考鏈接：【Python】pandas的describe參數詳解

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 描述性統計描述性統計使用Python進行描述性統計 R語言筆記005——計算描述性統計量統計學之描述性統計 pandas（5）：數學統計——描述性統計 Pandas | 06 描述性統計 2.描述性統計的matlab 實現 DAX 第六篇：統計函數（描述性統計）統計學之數據的描述性統計（基礎）