pandas中 transform 函数和 apply 函数的区别

本文转载自查看原文 2020-04-23 15:42 781 pandas/ python

There are two major differences between the transform and apply groupby methods.

apply implicitly passes all the columns for each group as a DataFrame to the custom function, while transform passes each column for each group as a Series to the custom function

The custom function passed to apply can return a scalar, or a Series or DataFrame (or numpy array or even list). The custom function passed to transform must return a sequence (a one dimensional Series, array or list) the same length as the group.（transform必须返回与组合相同长度的序列(一维的序列、数组或列表)）

So, transform works on just one Series at a time and apply works on the entire DataFrame at once.

from :https://stackoverflow.com/questions/27517425/apply-vs-transform-on-a-group-object#

transform 函数：

1.只允许在同一时间在一个Series上进行一次转换，如果定义列‘a’ 减去列‘b’，则会出现异常；

2.必须返回与 group相同的单个维度的序列（行）

3. 返回单个标量对象也可以使用，如 . transform(sum)

apply函数：

1. 不同于transform只允许在Series上进行一次转换， apply对整个DataFrame 作用

2.apply隐式地将group 上所有的列作为自定义函数

栗子：

#coding=gbk import numpy as np import pandas as pd data = pd.DataFrame({'state':['Florida','Florida','Texas','Texas'], 'a':[4,5,1,3], 'b':[6,10,3,11] }) print(data) # a b state # 0 4 6 Florida # 1 5 10 Florida # 2 1 3 Texas # 3 3 11 Texas def sub_two(X): return X['a'] - X['b'] data1 = data.groupby(data['state']).apply(sub_two) # 此处使用transform 则会出现错误 print(data1) # state # Florida 0 -2 # 1 -5 # Texas 2 -2 # 3 -8 # dtype: int64

返回单个标量可以使用transform：

：我们可以看到使用transform 和apply 的输出结果形式是不一样的，transform返回与数据同样长度的行，而apply则进行了聚合

此时，使用apply说明的信息更明确

def group_sum(x): return x.sum() data3 = data.groupby(data['state']).transform(group_sum) #返回与数据一样的 行 print(data3) # a b # 0 9 16 # 1 9 16 # 2 4 14 # 3 4 14 #但是使用apply时 data4 = data.groupby(data['state']).apply(group_sum) print(data4) # a b state # state # Florida 9 16 FloridaFlorida # Texas 4 14 TexasTexas

The other difference is that transform must return a single dimensional sequence the same size as the group. In this particular instance, each group has two rows, so transform must return a sequence of two rows. If it does not then an error is raised:

栗子2：

np.random.seed(666) df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'], 'B' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'], 'C' : np.random.randn(8), 'D' : np.random.randn(8)}) print(df) # A B C D # 0 foo one 0.824188 0.640573 # 1 bar one 0.479966 -0.786443 # 2 foo two 1.173468 0.608870 # 3 bar three 0.909048 -0.931012 # 4 foo two -0.571721 0.978222 # 5 bar two -0.109497 -0.736918 # 6 foo one 0.019028 -0.298733 # 7 foo three -0.943761 -0.460587 def zscore(x): return (x - x.mean())/ x.var() print(df.groupby('A').transform(zscore)) #自动识别CD列 print(df.groupby('A')['C','D'].apply(zscore)) #此种形式则两种输出数据是一样的 # df.groupby('A').apply(zscore) 此种情况则会报错，apply对整个dataframe作用 df['sum_c'] = df.groupby('A')['C'].transform(sum) #先对A列进行分组， 计算C列的和 df = df.sort_values('A') print(df) # A B C D sum_c # 1 bar one 0.479966 -0.786443 1.279517 # 3 bar three 0.909048 -0.931012 1.279517 # 5 bar two -0.109497 -0.736918 1.279517 # 0 foo one 0.824188 0.640573 0.501202 # 2 foo two 1.173468 0.608870 0.501202 # 4 foo two -0.571721 0.978222 0.501202 # 6 foo one 0.019028 -0.298733 0.501202 # 7 foo three -0.943761 -0.460587 0.501202 print(df.groupby('A')['C'].apply(sum)) # A # bar 1.279517 # foo 0.501202 # Name: C, dtype: float64

The function passed to transform must return a number, a row, or the same shape as the argument. if it's a number then the number will be set to all the elements in the group, if it's a row, it will be broadcasted to all the rows in the group.

函数传递给transform必须返回一个数字，一行，或者与参数相同的形状。如果是一个数字，那么数字将被设置为组中的所有元素，如果是一行，它将会被广播到组中的所有行。

参考：https://stackoverflow.com/questions/27517425/apply-vs-transform-on-a-group-object#

免责声明！

本站转载的文章为个人学习借鉴使用，本站对版权不负任何法律责任。如果侵犯了您的隐私权益，请联系本站邮箱yoyou2525@163.com删除。

猜您在找 pandas中agg()函数和apply()函数的区别 pandas中的map()、apply()、applymap()函数的区别 pandas的map函数与apply函数的区别 pandas中groupby,apply,lambda函数使用 pandas DataFrame apply()函数(1) pandas DataFrame apply()函数(2) 2018.03.29 python-pandas transform/apply 的使用 Pandas transform函数 pandas练习（四）--- 应用Apply函数 5-Pandas数据分组的函数应用（df.apply()、df.agg()和df.transform()、df.applymap()）