pandas dataframe, pandas series里的索引操作里的坑


Series類實例的檢索s[key]

當pd.Series的索引是數值型類型時, 我們不可以通過s1[-1]來檢索其最后一行的值

正確的做法是: s1.iloc[-1] 或者 s1[len(s1) - 1] 或者 s1.values[-1]

python語言里的魔術方法之__getitem__使類能夠具有索引鍵功能. 也就是說instance[key]
可以檢索到key對應的元素的值. pandas的Series類就是_getitem__方法的集大成者. 它里面隱藏了
很多規則.
這里深挖一下它的源碼, 當Series的實例s1的索引是整型數時, 如果用[-1]索引鍵來檢索時會發生什么情況呢?
我們順藤摸瓜來跑一下程序的脈絡:
getitem()里調用了: ._get_value(-1)方法, 該方法調用了: .index.get_loc(-1)方法.
問題就出在這里了: .index._range.index(-1)
'-1' 這個索引鍵根本就不在s1的索引里. 因為我們的s1的索引是: range(1)
所以程序才會拋出異常: KeyError: -1

當pd.Series的索引是字符型時(比如s2實例), 我們可以用s2[-1]來檢索其最后一行的值

結論: series[key]這種檢索方法, 功能很強大, 但是使用時要注意其索引的類型, 避免掉到坑里. 或者用.iloc()的方法更加明確一些.

Signature: s1.__getitem__(key)
Source:   
    def __getitem__(self, key):
        key = com.apply_if_callable(key, self)

        if key is Ellipsis:
            return self

        key_is_scalar = is_scalar(key)
        if isinstance(key, (list, tuple)):
            key = unpack_1tuple(key)

        if is_integer(key) and self.index._should_fallback_to_positional():
            return self._values[key]

        elif key_is_scalar:
            return self._get_value(key)

        if is_hashable(key):
            # Otherwise index.get_value will raise InvalidIndexError
            try:
                # For labels that don't resolve as scalars like tuples and frozensets
                result = self._get_value(key)

                return result

            except KeyError:
                if isinstance(key, tuple) and isinstance(self.index, MultiIndex):
                    # We still have the corner case where a tuple is a key
                    # in the first level of our MultiIndex
                    return self._get_values_tuple(key)

        if is_iterator(key):
            key = list(key)

        if com.is_bool_indexer(key):
            key = check_bool_indexer(self.index, key)
            key = np.asarray(key, dtype=bool)
            return self._get_values(key)

        return self._get_with(key)
File:      d:\anaconda3\lib\site-packages\pandas\core\series.py
Type:      method



Signature: s1._get_value(label, takeable:bool=False)
Source:   
    def _get_value(self, label, takeable: bool = False):
        """
        Quickly retrieve single value at passed index label.

        Parameters
        ----------
        label : object
        takeable : interpret the index as indexers, default False

        Returns
        -------
        scalar value
        """
        if takeable:
            return self._values[label]

        # Similar to Index.get_value, but we do not fall back to positional
        loc = self.index.get_loc(label)
        return self.index._get_values_for_loc(self, loc, label)
File:      d:\anaconda3\lib\site-packages\pandas\core\series.py
Type:      method



s1.index.get_loc??
Signature: s1.index.get_loc(key, method=None, tolerance=None)
Source:   
    @doc(Int64Index.get_loc)
    def get_loc(self, key, method=None, tolerance=None):
        if method is None and tolerance is None:
            if is_integer(key) or (is_float(key) and key.is_integer()):
                new_key = int(key)
                try:
                    return self._range.index(new_key)
                except ValueError as err:
                    raise KeyError(key) from err
            raise KeyError(key)
        return super().get_loc(key, method=method, tolerance=tolerance)
File:      d:\anaconda3\lib\site-packages\pandas\core\indexes\range.py
Type:      method



s1=pd.Series([111,222], range(2))
s2=pd.Series([111,222], list('ab'))


s1
Out[266]: 
0    111
1    222
dtype: int64

s2
Out[267]: 
a    111
b    222
dtype: int64

s2[-1]
Out[268]: 222
s1[-1]

Traceback (most recent call last):

  File "<ipython-input-269-0123e3764900>", line 1, in <module>
    s1[-1]

  File "D:\Anaconda3\lib\site-packages\pandas\core\series.py", line 882, in __getitem__
    return self._get_value(key)

  File "D:\Anaconda3\lib\site-packages\pandas\core\series.py", line 989, in _get_value
    loc = self.index.get_loc(label)

  File "D:\Anaconda3\lib\site-packages\pandas\core\indexes\range.py", line 357, in get_loc
    raise KeyError(key) from err

KeyError: -1

pd.DataFrame類實例的檢索df[key]

df是一個2D的數據結構, 它有兩個可以檢索的鍵: 或者是列名的組合或者是行名的組合(sliceable對象).
它的檢索規則更加隱藏和復雜. 總之: 提供了一種在行軸或者列軸上的切片操作.


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM