np.where與pd.Series.where及pd.DataFrame用法不一樣,下面一一進行學習,總結:
import numpy as np
import pandas as pd
help(np.where)
Help on built-in function where in module numpy.core.multiarray:
where(...)
where(condition, [x, y])
Return elements, either from `x` or `y`, depending on `condition`.
If only `condition` is given, return ``condition.nonzero()``.
Parameters
----------
condition : array_like, bool
When True, yield `x`, otherwise yield `y`.
x, y : array_like, optional
Values from which to choose. `x`, `y` and `condition` need to be
broadcastable to some shape.
Returns
-------
out : ndarray or tuple of ndarrays
If both `x` and `y` are specified, the output array contains
elements of `x` where `condition` is True, and elements from
`y` elsewhere.
If only `condition` is given, return the tuple
``condition.nonzero()``, the indices where `condition` is True.
See Also
--------
nonzero, choose
Notes
-----
If `x` and `y` are given and input arrays are 1-D, `where` is
equivalent to::
[xv if c else yv for (c,xv,yv) in zip(condition,x,y)]
Examples
--------
>>> np.where([[True, False], [True, True]],
... [[1, 2], [3, 4]],
... [[9, 8], [7, 6]])
array([[1, 8],
[3, 4]])
>>> np.where([[0, 1], [1, 0]])
(array([0, 1]), array([1, 0]))
>>> x = np.arange(9.).reshape(3, 3)
>>> np.where( x > 5 )
(array([2, 2, 2]), array([0, 1, 2]))
>>> x[np.where( x > 3.0 )] # Note: result is 1D.
array([ 4., 5., 6., 7., 8.])
>>> np.where(x < 5, x, -1) # Note: broadcasting.
array([[ 0., 1., 2.],
[ 3., 4., -1.],
[-1., -1., -1.]])
Find the indices of elements of `x` that are in `goodvalues`.
>>> goodvalues = [3, 4, 7]
>>> ix = np.isin(x, goodvalues)
>>> ix
array([[False, False, False],
[ True, True, False],
[False, True, False]])
>>> np.where(ix)
(array([1, 1, 2]), array([0, 1, 1]))
- np.where用法
從上面幫助信息可以看到:np.where的參數有condition,可選參數x,y。
而有無可選參數以及可選參數x,y的維數將直接影響np.where的返回結果:如果沒有可選參數x,y則相當於np.nonzero,返回condition數組的True或者非0的包含索引列表對的元組;如果有x,y則輸出的數組形狀首先與condition,x,y的一致(如果不一致,則廣播為一致)根據condition的值來從x,y中挑選值。
(1)無可選參數,x,y
a=np.random.randint(0,high=2,size=(3,3));a
array([[0, 1, 1],
[1, 1, 0],
[1, 1, 0]])
np.where(a)
(array([0, 0, 1, 1, 2, 2], dtype=int64),
array([1, 2, 0, 1, 0, 1], dtype=int64))
(2)有x,y,輸出結果的形狀是condition,x,y的廣播后的數組的形狀,然后根據condition從x,y中挑選。
cond=np.array([True,False])
x=np.arange(6).reshape(3,2);x
array([[0, 1],
[2, 3],
[4, 5]])
y=np.array([[100,200]])
cond.shape
(2,)
x.shape
(3, 2)
y.shape
(1, 2)
所以廣播后的形狀應該是(3,2)
result=np.where(cond,x,y);result
array([[ 0, 200],
[ 2, 200],
[ 4, 200]])
result.shape
(3, 2)
- pandas中的where
help(pd.DataFrame.where)
Help on function where in module pandas.core.generic:
where(self, cond, other=nan, inplace=False, axis=None, level=None, errors='raise', try_cast=False, raise_on_error=None)
Return an object of same shape as self and whose corresponding
entries are from self where `cond` is True and otherwise are from
`other`.
Parameters
----------
cond : boolean NDFrame, array-like, or callable
Where `cond` is True, keep the original value. Where
False, replace with corresponding value from `other`.
If `cond` is callable, it is computed on the NDFrame and
should return boolean NDFrame or array. The callable must
not change input NDFrame (though pandas doesn't check it).
.. versionadded:: 0.18.1
A callable can be used as cond.
other : scalar, NDFrame, or callable
Entries where `cond` is False are replaced with
corresponding value from `other`.
If other is callable, it is computed on the NDFrame and
should return scalar or NDFrame. The callable must not
change input NDFrame (though pandas doesn't check it).
.. versionadded:: 0.18.1
A callable can be used as other.
inplace : boolean, default False
Whether to perform the operation in place on the data
axis : alignment axis if needed, default None
level : alignment level if needed, default None
errors : str, {'raise', 'ignore'}, default 'raise'
- ``raise`` : allow exceptions to be raised
- ``ignore`` : suppress exceptions. On error return original object
Note that currently this parameter won't affect
the results and will always coerce to a suitable dtype.
try_cast : boolean, default False
try to cast the result back to the input type (if possible),
raise_on_error : boolean, default True
Whether to raise on invalid data types (e.g. trying to where on
strings)
.. deprecated:: 0.21.0
Returns
-------
wh : same type as caller
Notes
-----
The where method is an application of the if-then idiom. For each
element in the calling DataFrame, if ``cond`` is ``True`` the
element is used; otherwise the corresponding element from the DataFrame
``other`` is used.
The signature for :func:`DataFrame.where` differs from
:func:`numpy.where`. Roughly ``df1.where(m, df2)`` is equivalent to
``np.where(m, df1, df2)``.
For further details and examples see the ``where`` documentation in
:ref:`indexing <indexing.where_mask>`.
Examples
--------
>>> s = pd.Series(range(5))
>>> s.where(s > 0)
0 NaN
1 1.0
2 2.0
3 3.0
4 4.0
>>> s.mask(s > 0)
0 0.0
1 NaN
2 NaN
3 NaN
4 NaN
>>> s.where(s > 1, 10)
0 10.0
1 10.0
2 2.0
3 3.0
4 4.0
>>> df = pd.DataFrame(np.arange(10).reshape(-1, 2), columns=['A', 'B'])
>>> m = df % 3 == 0
>>> df.where(m, -df)
A B
0 0 -1
1 -2 3
2 -4 -5
3 6 -7
4 -8 9
>>> df.where(m, -df) == np.where(m, df, -df)
A B
0 True True
1 True True
2 True True
3 True True
4 True True
>>> df.where(m, -df) == df.mask(~m, -df)
A B
0 True True
1 True True
2 True True
3 True True
4 True True
See Also
--------
:func:`DataFrame.mask`
從上面幫助信息可以看到:DataFrame和Series的where函數遵循的是if-then模式,即調用者(DataFrame,或者Series)中的元素對於在condition中為True的保留,為False的,用other填充(默認為nan),inplace默認為False,即返回一個與調用者形狀一樣的DataFrame或者Series,如果為True,則原地修改.其與mask方法正好相反.
- np.where與DataFrame或Series的where方法的區別:
(1)numpy中是模塊級別的函數,numpy模塊下ndarray對象並沒有where方法;而pandas沒有模塊級別where方法,只能通過DataFrame,Series對象來調用
(2)np.where中condition可以是數組,布爾值,而pandas的DataFrame及Series的condition不僅可以是數組,布爾值,還可以是函數句柄;
(3)前者有對於condition為True的選擇集合x,而后者遵循的是if-then模式,僅對condition為False情況給出其選擇集合
(4)前者返回值的形狀與condition,x,y有關,是三者廣播后數組的形狀;而后者返回值與調用者保持一致
(5)后者有inplace參數,可以決定是返回一個新的對象還是對調用者原地修改;而前者本身就是要重組一個數組,所以沒有inplace這個參數.