重塑層次化索引
層次化索引為DataFrame的重排提供了良好的一致性操作,主要方法有
stack :將數據的列旋轉為行
unstack:將數據的行轉換為列
用一個dataframe對象舉例
In [4]: data = DataFrame(np.arange(6).reshape((2,3)),index = pd.Index(['Ohio','Colorado'],name='state'),columns = pd.Index(['one','two','three'],name = 'number')) In [5]: data Out[5]: number one two three state Ohio 0 1 2 Colorado 3 4 5 In [6]: data.stack()#將列索引轉換為行索引 Out[6]: state number Ohio one 0 two 1 three 2 Colorado one 3 two 4 three 5 dtype: int32 In [7]: data.unstack()#將行索引轉換為列索引 Out[7]: number state one Ohio 0 Colorado 3 two Ohio 1 Colorado 4 three Ohio 2 Colorado 5 dtype: int32 In [9]: data.unstack().index Out[9]: MultiIndex(levels=[['one', 'two', 'three'], ['Ohio', 'Colorado']], labels=[[0, 0, 1, 1, 2, 2], [0, 1, 0, 1, 0, 1]], names=['number', 'state']) In [10]:
對於DataFrame,無論是使用unstack,還是stack,得到都是一個Series對象
Series對象,只有unstack方法。
默認情況下,unstack操作的是最內層,傳入分層級別的編號或名稱即可對相應級別的索引做操作。
In [21]: result.unstack(0) Out[21]: state Ohio Colorado number one 0 3 two 1 4 three 2 5 In [22]: result.unstack() Out[22]: number one two three state Ohio 0 1 2 Colorado 3 4 5 In [23]: result.unstack('state') Out[23]: state Ohio Colorado number one 0 3 two 1 4 three 2 5
如果不是所有的級別的值都能在個分組中找到的話,則unstack會引入缺失值
In [24]: s1 =Series([0,1,2,3],index = ['a','b','c','d']) In [25]: s2 = Series([4,5,6],index = ['c','d','e']) In [26]: data2 = pd.concat([s1,s2],keys = ['one','two']) In [27]: data2 Out[27]: one a 0 b 1 c 2 d 3 two c 4 d 5 e 6 dtype: int64 In [28]: data2.unstack() Out[28]: a b c d e one 0.0 1.0 2.0 3.0 NaN two NaN NaN 4.0 5.0 6.0 In [29]: data2.unstack(0) Out[29]: one two a 0.0 NaN b 1.0 NaN c 2.0 4.0 d 3.0 5.0 e NaN 6.0
而stack默認會濾除缺失值。
在對DataFrame進行旋轉操作時,旋轉的軸會成為旋轉后索引的最低級別。也就是最內層索引。