pandas.DataFrame.where和mask 解讀

本文轉載自查看原文 2019-11-01 15:30 1669 python數據分析

1.前言背景

沒怎么用過df.where 都是直接使用loc、apply等方法去解決。

可能是某些功能還沒有超出loc和apply的適用范圍。

DataFrame.where(self, cond, other=nan, inplace=False, axis=None, level=None, errors='raise', try_cast=False)

note:Replace values in DataFrame with other where the cond is False.

我們還是要看一下官網對里面每一個參數的解釋：

紅色是特別注意的，往往無論是博客還是案例一般給不會窮舉所有可能，只有把api的每一種可能理解了，才能無招勝有招。

大體意思：就是對一個DataFrame進行條件判斷當他的條件不符合就選擇other參數里面的數值。

其實它擁有一個相反的函數where<==>mask：where條件不符合進行替換，mask是條件符合進行替換。

DataFrame.mask(self, cond, other=nan, inplace=False, axis=None, level=None, errors='raise', try_cast=False)

note:Replace values in DataFrame with other where the cond is True.

我們還是要看一下官網對里面每一個參數的解釋：

也可以看到兩者參數並無差異。

np.where(condition, [x, y]),這里三個參數,其中必寫參數是condition(判斷條件),后邊的x和y是可選參數.那么這三個參數都有怎樣的要求呢?

condition：array_like，bool ,當為True時，產生x，否則產生y

簡單說,對第一個參數的要求是這樣的,首先是數據類型的要求,類似於數組或者布爾值,當判斷條件為真時返回x中的值,否則返回y中的值

x，y：array_like，可選,要從中選擇的值。 x，y和condition需要可廣播到某種形狀

x和y是可選參數,並且對這兩個參數的數據類型要求只有類似數組這一條,當條件判斷為true或者false時從這兩個類似數組的容器中取數.

s = pd.Series(range(5))

s.mask(s  > 0)

s.where(s > 0)

ss = pd.Series(range(10,20,2))
import numpy as np
np.where(s>2,s,ss)

下面我在cond使用callable類型，在other參數中使用callable參數

df = pd.DataFrame(np.arange(10).reshape(-1, 2), columns=['A', 'B'])
df

def cond1(x):
     return x%3==0
def mult3(x):
    return x*3
df.where(cond1, mult3)

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 pandas DataFrame.where() 檢查一個或多個條件的數據幀，並相應地返回結果 pandas DataFrame applymap()函數 pandas中DataFrame操作(一) pandas DataFrame 數據篩選 Pandas——修改DataFrame列名 pandas.DataFrame.to_sql pandas.DataFrame.dropna pandas.DataFrame.rank pandas DataFrame的創建方法 DataFrame詳解-pandas