pandas.apply()函數

本文轉載自查看原文 2019-12-07 22:39 4175 ML

1、介紹

apply函數是pandas里面所有函數中自由度最高的函數。該函數如下：

DataFrame.apply(func, axis=0, broadcast=False, raw=False, reduce=None, args=(), **kwds)

該函數最有用的是第一個參數，這個參數是函數，相當於C/C++的函數指針。

這個函數需要自己實現，函數的傳入參數根據axis來定，比如axis = 1，就會把一行數據作為Series的數據結構傳入給自己實現的函數中，我們在函數中實現對Series不同屬性之間的計算，返回一個結果，則apply函數會自動遍歷每一行DataFrame的數據，最后將所有結果組合成一個Series數據結構並返回。

2、樣例

import numpy as np
import pandas as pd

if __name__ == '__main__':
    f = lambda x : x.max() - x.min()
    df = pd.DataFrame(np.random.randn(4, 3), columns=list('bde'), index=['utah', 'ohio', 'texas', 'oregon']) #columns表述列標， index表述行標
    print(df)

    t1 = df.apply(f) #df.apply(function, axis=0)，默認axis=0，表示將一列數據作為Series的數據結構傳入給定的function中
    print(t1)

    t2 = df.apply(f, axis=1)
    print(t2)

輸出結果如下所示：

               b         d         e
utah    1.950737  0.318299  0.387724
ohio    1.584464 -0.082965  0.984757
texas   0.477283 -2.774454 -0.532181
oregon -0.851359 -0.654882  1.026698


b    2.802096
d    3.092753
e    1.558879
dtype: float64

utah      1.632438
ohio      1.667428
texas     3.251737
oregon    1.878057
dtype: float64

3、性能比較

import numpy as np
import pandas as pd

def my_test(a, b):
    return a + b

if __name__ == '__main__':
    df = pd.DataFrame({'a':np.random.randn(6),
                       'b':['foo', 'bar'] * 3,
                       'c':np.random.randn(6)})

    print(df)

    df['value1'] = df.apply(lambda row: my_test(row['a'], row['c']), axis=1)
    print(df)

    df['vaule2'] = df['a'] + df['c']
    print(df)

輸出結果如下：

          a    b         c
0 -1.745471  foo  0.723341
1 -0.378998  bar  0.229188
2 -1.468866  foo  0.788046
3 -1.323347  bar  0.323051
4 -1.894372  foo  2.216768
5 -0.649059  bar  0.858149


          a    b         c    value1
0 -1.745471  foo  0.723341 -1.022130
1 -0.378998  bar  0.229188 -0.149810
2 -1.468866  foo  0.788046 -0.680820
3 -1.323347  bar  0.323051 -1.000296
4 -1.894372  foo  2.216768  0.322396
5 -0.649059  bar  0.858149  0.209089


          a    b         c    value1    vaule2
0 -1.745471  foo  0.723341 -1.022130 -1.022130
1 -0.378998  bar  0.229188 -0.149810 -0.149810
2 -1.468866  foo  0.788046 -0.680820 -0.680820
3 -1.323347  bar  0.323051 -1.000296 -1.000296
4 -1.894372  foo  2.216768  0.322396  0.322396
5 -0.649059  bar  0.858149  0.209089  0.209089

注意：當數據量很大時，對於簡單的邏輯處理建議方法2（個人處理幾百M數據集時，方法1花時200s左右，方法2花時10s）！！！

版權聲明：本文為CSDN博主「鴻燕藏鋒」的原創文章，遵循 CC 4.0 BY-SA 版權協議，轉載請附上原文出處鏈接及本聲明。
原文鏈接：https://blog.csdn.net/yanjiangdi/article/details/94764562

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 pandas DataFrame apply()函數(1) pandas DataFrame apply()函數(2) pandas 的apply() 函數 pandas 的apply() 函數 pandas練習（四）--- 應用Apply函數 pandas apply()函數參數 args pandas的map函數與apply函數的區別 pandas中 transform 函數和 apply 函數的區別 pandas中agg()函數和apply()函數的區別【轉】Pandas的Apply函數——Pandas中最好用的函數