Python 數據分析 - 索引和選擇數據


loc,iloc,ix三者間的區別和聯系

loc

.loc is primarily label based, but may also be used with a boolean array.
就是說,loc方法主要是用label來選擇數據的。[1]

  • A single label, e.g. 5 or 'a', (note that 5 is interpreted as a label of the index. This use is not an integer position along the index)
  • A list or array of labels ['a', 'b', 'c']
  • A slice object with labels 'a':'f', (note that contrary to usual python slices, both the start and the stop are included!)
  • A boolean array

總的形式還是要保持的df[xx:xx,xx:xx],只不過這里邊可以不用切片,但是中間的,還是很關鍵的。可以不寫,,那么,就表示取某一行。但是,不能表示取某一列。

import pandas as pd

import numpy as np

test=pd.DataFrame(np.random.randn(20).reshape(4,5),index=['A','B','C','D'],columns=['E','F','G','H','I'])

test
Out[4]: 
          E         F         G         H         I
A -0.833316 -1.982666  1.055594  0.781759 -0.107631
B -1.514709 -1.422883  0.204399 -0.487639 -1.652785
C -0.424735  0.400529 -0.786582  0.855885  0.059894
D  2.016221 -1.314878 -1.745535 -0.907778  0.834966

test.loc['A']
Out[5]: 
E   -0.833316
F   -1.982666
G    1.055594
H    0.781759
I   -0.107631
Name: A, dtype: float64

test.loc['E']
KeyError: 'the label [E] is not in the [index]'

#看見了吧,是“閉區間”
test.loc['A':'B','E':'F']
Out[8]: 
          E         F
A -0.833316 -1.982666
B -1.514709 -1.422883

label切片選擇時,貌似是“閉區間”,:后邊的也是包含進去的。

iloc

.iloc is primarily integer position based (from 0 to length-1 of the axis), but may also be used with a boolean array.
iloc主要就是基於position的選擇。注意了,這里的position選擇是一種”左閉右開“區間,意思就是df[m:n]只選擇m:n-1行的數據。

  • An integer e.g. 5
  • A list or array of integers [4, 3, 0]
  • A slice object with ints 1:7
  • A boolean array
import pandas as pd

import numpy as np

test=pd.DataFrame(np.random.randn(20).reshape(4,5),index=['A','B','C','D'],columns=['E','F','G','H','I'])

test
Out[4]: 
          E         F         G         H         I
A -0.833316 -1.982666  1.055594  0.781759 -0.107631
B -1.514709 -1.422883  0.204399 -0.487639 -1.652785
C -0.424735  0.400529 -0.786582  0.855885  0.059894
D  2.016221 -1.314878 -1.745535 -0.907778  0.834966

#看見了吧,是“左閉右開”區間呀!
test.iloc[0:1,0:1]
Out[10]: 
          E
A -0.833316

ix

.ix supports mixed integer and label based access. It is primarily label based, but will fall back to integer positional access unless the corresponding axis is of integer type.
ix就是一種集大成者的選擇方法呀!既支持position選擇,也支持label選擇。主要是label選擇。

import pandas as pd

import numpy as np

test=pd.DataFrame(np.random.randn(20).reshape(4,5),index=['A','B','C','D'],columns=['E','F','G','H','I'])

test
Out[4]: 
          E         F         G         H         I
A -0.833316 -1.982666  1.055594  0.781759 -0.107631
B -1.514709 -1.422883  0.204399 -0.487639 -1.652785
C -0.424735  0.400529 -0.786582  0.855885  0.059894
D  2.016221 -1.314878 -1.745535 -0.907778  0.834966

#下面的`ix`是不是和`loc`作用差不多啊~
test.ix['A':'B','E':'F']
Out[12]: 
          E         F
A -0.833316 -1.982666
B -1.514709 -1.422883

#下面的是和`iloc`差不多了
test.ix[0:1,0:1]
Out[11]: 
          E
A -0.833316

但是需要注意的是,當index或者columns是整數時,ix索引其實是按label選擇的,因此,是閉區間的

參考

發現還是官方文檔說的最詳細啊!希望以后有機會多看看這里的內容~


  1. 官方文檔-Indexing and Selecting Data ↩︎


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM