pandas中 transform 函數和 apply 函數的區別

本文轉載自查看原文 2020-04-23 15:42 781 pandas/ python

There are two major differences between the transform and apply groupby methods.

apply implicitly passes all the columns for each group as a DataFrame to the custom function, while transform passes each column for each group as a Series to the custom function

The custom function passed to apply can return a scalar, or a Series or DataFrame (or numpy array or even list). The custom function passed to transform must return a sequence (a one dimensional Series, array or list) the same length as the group.（transform必須返回與組合相同長度的序列(一維的序列、數組或列表)）

So, transform works on just one Series at a time and apply works on the entire DataFrame at once.

from :https://stackoverflow.com/questions/27517425/apply-vs-transform-on-a-group-object#

transform 函數：

1.只允許在同一時間在一個Series上進行一次轉換，如果定義列‘a’ 減去列‘b’，則會出現異常；

2.必須返回與 group相同的單個維度的序列（行）

3. 返回單個標量對象也可以使用，如 . transform(sum)

apply函數：

1. 不同於transform只允許在Series上進行一次轉換， apply對整個DataFrame 作用

2.apply隱式地將group 上所有的列作為自定義函數

栗子：

#coding=gbk import numpy as np import pandas as pd data = pd.DataFrame({'state':['Florida','Florida','Texas','Texas'], 'a':[4,5,1,3], 'b':[6,10,3,11] }) print(data) # a b state # 0 4 6 Florida # 1 5 10 Florida # 2 1 3 Texas # 3 3 11 Texas def sub_two(X): return X['a'] - X['b'] data1 = data.groupby(data['state']).apply(sub_two) # 此處使用transform 則會出現錯誤 print(data1) # state # Florida 0 -2 # 1 -5 # Texas 2 -2 # 3 -8 # dtype: int64

返回單個標量可以使用transform：

：我們可以看到使用transform 和apply 的輸出結果形式是不一樣的，transform返回與數據同樣長度的行，而apply則進行了聚合

此時，使用apply說明的信息更明確

def group_sum(x): return x.sum() data3 = data.groupby(data['state']).transform(group_sum) #返回與數據一樣的 行 print(data3) # a b # 0 9 16 # 1 9 16 # 2 4 14 # 3 4 14 #但是使用apply時 data4 = data.groupby(data['state']).apply(group_sum) print(data4) # a b state # state # Florida 9 16 FloridaFlorida # Texas 4 14 TexasTexas

The other difference is that transform must return a single dimensional sequence the same size as the group. In this particular instance, each group has two rows, so transform must return a sequence of two rows. If it does not then an error is raised:

栗子2：

np.random.seed(666) df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'], 'B' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'], 'C' : np.random.randn(8), 'D' : np.random.randn(8)}) print(df) # A B C D # 0 foo one 0.824188 0.640573 # 1 bar one 0.479966 -0.786443 # 2 foo two 1.173468 0.608870 # 3 bar three 0.909048 -0.931012 # 4 foo two -0.571721 0.978222 # 5 bar two -0.109497 -0.736918 # 6 foo one 0.019028 -0.298733 # 7 foo three -0.943761 -0.460587 def zscore(x): return (x - x.mean())/ x.var() print(df.groupby('A').transform(zscore)) #自動識別CD列 print(df.groupby('A')['C','D'].apply(zscore)) #此種形式則兩種輸出數據是一樣的 # df.groupby('A').apply(zscore) 此種情況則會報錯，apply對整個dataframe作用 df['sum_c'] = df.groupby('A')['C'].transform(sum) #先對A列進行分組， 計算C列的和 df = df.sort_values('A') print(df) # A B C D sum_c # 1 bar one 0.479966 -0.786443 1.279517 # 3 bar three 0.909048 -0.931012 1.279517 # 5 bar two -0.109497 -0.736918 1.279517 # 0 foo one 0.824188 0.640573 0.501202 # 2 foo two 1.173468 0.608870 0.501202 # 4 foo two -0.571721 0.978222 0.501202 # 6 foo one 0.019028 -0.298733 0.501202 # 7 foo three -0.943761 -0.460587 0.501202 print(df.groupby('A')['C'].apply(sum)) # A # bar 1.279517 # foo 0.501202 # Name: C, dtype: float64

The function passed to transform must return a number, a row, or the same shape as the argument. if it's a number then the number will be set to all the elements in the group, if it's a row, it will be broadcasted to all the rows in the group.

函數傳遞給transform必須返回一個數字，一行，或者與參數相同的形狀。如果是一個數字，那么數字將被設置為組中的所有元素，如果是一行，它將會被廣播到組中的所有行。

參考：https://stackoverflow.com/questions/27517425/apply-vs-transform-on-a-group-object#

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 pandas中agg()函數和apply()函數的區別 pandas中的map()、apply()、applymap()函數的區別 pandas的map函數與apply函數的區別 pandas中groupby,apply,lambda函數使用 pandas DataFrame apply()函數(1) pandas DataFrame apply()函數(2) pandas.apply()函數 pandas 的apply() 函數 pandas 的apply() 函數 Pandas transform函數