pandas 前后行操作

本文轉載自查看原文 2016-07-25 13:14 5039 pandas/ python/ 行遍歷/ shift/ mask

一、前后行滿足條件

問題：

各位老師好，我有一個dataframe

產品數據1 數據2
A 1 2
B 4 5
C 6 3
我想找出比如這一行數據1>數據2 AND 數據1的上一行<數據2的上一行
例如上例子，6>3 AND 4<5 則輸出產品C
應該怎么寫

回答：

df = pa.DataFrame({'產品': ['A','B','C'],
                   '數據1': [1, 4, 6],
                   '數據2': [2, 5, 3]})
df[(df['數據1'].shift(1) < df['數據2'].shift(1)) & (df['數據1'].shift(0) > df['數據2'].shift(0))]['產品']

說明：

選擇行的最快的方法不是遍歷行。而是，創建一個mask（即，布爾數組），然后調用df[mask]選擇。
這里有一個問題：如何動態表示dataframe中的當前行、前一行？答案是用shift。
shift(0):當前行
shift(1):前一行
shift(n):往前第n行

若要滿足多個條件
邏輯與&：
mask = ((...) & (...))

邏輯或|：
mask = ((...) | (...))

邏輯非~:
mask = ~(...)

例如:

In [75]: df = pd.DataFrame({'A':range(5), 'B':range(10,20,2)})

In [76]: df
Out[76]: 
   A   B
0  0  10
1  1  12
2  2  14
3  3  16
4  4  18

In [77]: mask = (df['A'].shift(1) + df['B'].shift(2) > 12)

In [78]: mask
Out[78]: 
0    False
1    False
2    False
3     True
4     True
dtype: bool

In [79]: df[mask]
Out[79]: 
   A   B
3  3  16
4  4  18

二、前后行構造數據

問題：

If I have the following dataframe:

date A B M S
20150101 8 7 7.5 0
20150101 10 9 9.5 -1
20150102 9 8 8.5 1
20150103 11 11 11 0
20150104 11 10 10.5 0
20150105 12 10 11 -1
...

If I want to create another column 'cost' by the following rules:
if S < 0, cost = (M-B).shift(1)*S
if S > 0, cost = (M-A).shift(1)*S
if S == 0, cost=0
currently, I am using the following function:

def cost(df):
if df[3]<0:
return np.roll((df[2]-df[1]),1)df[3]
elif df[3]>0:
return np.roll((df[2]-df[0]),1)df[3]
else:
return 0
df['cost']=df.apply(cost,axis=0)

Is there any other way to do it? can I somehow use pandas shift function in user defined functions? thanks.

答案：

import numpy as np
import pandas as pd
 
df = pd.DataFrame({'date': ['20150101','20150102','20150103','20150104','20150105','20150106'],
                   'A': [8,10,9,11,11,12],
                   'B': [7,9,8,11,10,10],
                   'M': [7.5,9.5,8.5,11,10.5,11],
                   'S': [0,-1,1,0,0,-1]})

df = df.reindex(columns=['date','A','B','M','S'])

# 方法一
df['cost'] = np.where(df['S'] < 0,
                      np.roll((df['M']-df['B']), 1)*df['S'],
                      np.where(df['S'] > 0,
                               np.roll((df['M']-df['A']), 1)*df['S'],
                               0)
                     )
            
# 方法二
M, A, B, S = [df[col] for col in 'MABS']
conditions = [S < 0, S > 0]
choices = [(M-B).shift(1)*S, (M-A).shift(1)*S]
df['cost2'] = np.select(conditions, choices, default=0)


print(df)

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 pandas模塊的數據操作 pandas高級操作總結 Pandas之DataFrame基本操作 Python pandas DataFrame操作 pandas文件操作 Pandas 之 DataFrame 常用操作 pandas之索引操作 pandas 基本操作 pandas基本操作 Pandas 篩選操作