Python學習筆記：pandas數據堆疊之stack、unstack與reshape

本文轉載自查看原文 2021-09-18 18:50 100 Python

在分類匯總數據中，stack() 和 unstack() 是進行層次化索引的重要操作。

層次化索引就是對索引進行層次化分類，包含行索引、列索引。

常見的數據層次化結構包含兩種：表格（橫表）、“花括號”（縱表）。

表格在行列方向上均有索引，花括號結構只有“列方向”上的索引。

其實，應用 stack() 和 unstack()只需要記住：

stack —— 將數據從”表格結構“變成”花括號結構“，即將其列索引變成行索引。
unstack —— 數據從”花括號結構“變成”表格結構“，即要將其中一層的行索引變成列索引。如果是多層索引，則以上函數是針對內層索引，利用 level 參數可以選擇具體哪層索引。

【小技巧】使用 stack() 的時候，level 等於哪一個，哪一個就消失，出現在行里。

【小技巧】使用 unstack() 的時候，level 等於哪一個，哪一個就消失，出現在列里。

一、stack堆疊

stack() 返回一個 Series，需要通過 reset_index() 進行重置索引。

使用語法：

DataFrame.stack(level=-1, dropna=True)

單索引

# 構建測試集
import pandas as pd
import numpy as np
df_size = 10
df = pd.DataFrame({
    'a': np.random.rand(df_size),
    'b': np.random.rand(df_size),
    'c': np.random.rand(df_size),
    'd': np.random.rand(df_size),
    'e': np.random.rand(df_size)
    })
print(df)

# 不指定參數 所有列都將被堆疊
data = df.stack() # 一維Series
'''
0  a    0.374002
   b    0.289687
   c    0.720090
   d    0.645252
   e    0.063648
1  a    0.012059
   b    0.228809
   c    0.018861
   d    0.511085
   e    0.002751
dtype: float64
'''

# 重設索引
df.stack().reset_index()
'''
    level_0 level_1         0
0         0       a  0.374002
1         0       b  0.289687
2         0       c  0.720090
3         0       d  0.645252
4         0       e  0.063648
'''

使用 stack 函數，將數據框的列索引轉變成行索引（第二層），得到一個層次化的 Series 。

多層索引

multicol1 = pd.MultiIndex.from_tuples([('weight', 'kg'),
                                       ('weight', 'pounds')])
df_multi = pd.DataFrame([[1, 2], [2, 4]],
                                    index=['cat', 'dog'],
                                    columns=multicol1)

df_multi.stack() # 內層列索引
df_multi.stack(level=1) # 同上 內層列索引
df_multi.stack(level=0) # 第一層列索引

# 刪除空數據行
df_multi.stack(dropna=True)

二、unstack反堆疊

使用語法：

DataFrame.unstack(level=-1, fill_value=None)

單索引

# 反堆疊
df.stack().unstack() 

# 通過level參數選擇堆疊的索引
df.stack().unstack(level=0)
'''
          0         1         2  ...         7         8         9
a  0.521016  0.349603  0.140595  ...  0.578615  0.629479  0.896016
b  0.043503  0.540825  0.379667  ...  0.570826  0.484303  0.922657
c  0.674632  0.044395  0.931385  ...  0.974338  0.228876  0.081472
d  0.960165  0.859809  0.713214  ...  0.247970  0.665914  0.653477
e  0.330087  0.453380  0.293309  ...  0.885709  0.591437  0.842542
'''

# 列標簽填充
df.stack().unstack(level=0, fill_value='type')

利用 unstack 函數，將生成后的第二層行索引轉變成列索引（默認內層索引，level=-1），恢復原始數據框。

多層索引

df_multi2 = df_multi.stack(level=0)
df_multi2.unstack(level=0)
'''
        kg     pounds    
       cat dog    cat dog
weight   1   2      2   4
'''

三、reshape變形

實現 Series 數據變形。

import pandas as pd
import numpy as np
data = pd.DataFrame(np.arange(12).reshape(3,4),
                    index=pd.Index(['street1','street2','street3']),
                    columns=pd.Index(['store1','store2','store3','store4']))
print(data)

參考鏈接：Python: Pandas中stack和unstack的形象理解

參考鏈接：pandas中stack的用法

參考鏈接：pandas中stack和unstack作用的簡單解釋

參考鏈接：pandas.DataFrame.stack

參考鏈接：pandas.DataFrame.unstack

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 python pandas stack和unstack函數 pandas 的stack() 和unstack() 函數 Pandas 基礎(12) - Stack 和 Unstack pandas中DataFrame的stack()、unstack()和pivot()方法的對比 pandas重塑層次化索引(stack()和unstack()函數解析) Python學習筆記：pandas篩選數據 pandas的reshape(1,-1) Python學習筆記：Pandas查看數據顯示不全設置 Python學習筆記：pandas之transform Pandas數據規整學習筆記2