DataFrame(9):DataFrame運算——累計統計函數


1、相關函數說明

2、原始數據

df = pd.DataFrame({"id":["00{}".format(i) for i in range(1,10)],
                   "score":[2,3,4,4,5,6,7,7,8]})
display(df)

結果如下:

            

3、cumsum()函數:求前n個元素的累積值

(很重要的一個函數)

df = pd.DataFrame({"id":["00{}".format(i) for i in range(1,10)],
                   "score":[2,3,4,4,5,6,7,7,8]})
display(df)

df["cumsum"] = df["score"].cumsum(axis=0)
display(df)

結果如下:

            

1)cumsum():分組求累計值

df = pd.DataFrame({"id":["001","001","002","003","001","002","002","003","003"],
                   "score":[2,3,4,4,5,6,7,7,8]})
display(df)

df["分組求累計值"] = df.groupby("id").cumsum()
df = df.sort_values(by=["id"])
display(df)

結果如下:

            

4、cummax()函數:求前n個元素中的最大值

df = pd.DataFrame({"score":[1,2,1,5,2,6,3,7,1]})
display(df)

df["前n個值中最大值"] = df["score"].cummax(axis=0)
display(df)

結果如下:

            

1)cummax()函數:分組求前n個元素中的最大值

df = pd.DataFrame({"id":["001","001","002","003","001","002","002","003","003"],
                   "date":["2020-01-01","2020-01-09","2020-01-05","2020-01-03",
                           "2020-01-08","2020-01-07","2020-01-02","2020-01-04","2020-01-06"],
                   "score":[1,2,1,5,2,6,3,7,1]})
display(df)

df = df.sort_values(by=["id","date"],ascending=[True,True])
df["前n個值中最大值"] = df.groupby("id")["score"].cummax()
display(df)

結果如下:

            

注意:cummin()函數的用法和cummax()函數的用法一致,可以自行下去嘗試。  

5、cumprod()函數:求前n個元素的累乘積

df = pd.DataFrame({"score":[1,2,1,5,2,6,3,7,1]})
display(df)

df["前n個值的累乘積"] = df["score"].cummax(axis=0)
display(df)

結果如下:

            

注意:對於分組求前n個元素的累乘積,和上面用法一致。


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM