loc,iloc,ix三者間的區別和聯系
loc
.loc is primarily label based, but may also be used with a boolean array.
就是說,loc方法主要是用label來選擇數據的。[1]
- A single label, e.g. 5 or 'a', (note that 5 is interpreted as a label of the index. This use is not an integer position along the index)
- A list or array of labels ['a', 'b', 'c']
- A slice object with labels 'a':'f', (note that contrary to usual python slices, both the start and the stop are included!)
- A boolean array
總的形式還是要保持的df[xx:xx,xx:xx],只不過這里邊可以不用切片,但是中間的,還是很關鍵的。可以不寫,,那么,就表示取某一行。但是,不能表示取某一列。
import pandas as pd
import numpy as np
test=pd.DataFrame(np.random.randn(20).reshape(4,5),index=['A','B','C','D'],columns=['E','F','G','H','I'])
test
Out[4]:
E F G H I
A -0.833316 -1.982666 1.055594 0.781759 -0.107631
B -1.514709 -1.422883 0.204399 -0.487639 -1.652785
C -0.424735 0.400529 -0.786582 0.855885 0.059894
D 2.016221 -1.314878 -1.745535 -0.907778 0.834966
test.loc['A']
Out[5]:
E -0.833316
F -1.982666
G 1.055594
H 0.781759
I -0.107631
Name: A, dtype: float64
test.loc['E']
KeyError: 'the label [E] is not in the [index]'
#看見了吧,是“閉區間”
test.loc['A':'B','E':'F']
Out[8]:
E F
A -0.833316 -1.982666
B -1.514709 -1.422883
label切片選擇時,貌似是“閉區間”,:后邊的也是包含進去的。
iloc
.iloc is primarily integer position based (from 0 to length-1 of the axis), but may also be used with a boolean array.
iloc主要就是基於position的選擇。注意了,這里的position選擇是一種”左閉右開“區間,意思就是df[m:n]只選擇m:n-1行的數據。
- An integer e.g. 5
- A list or array of integers [4, 3, 0]
- A slice object with ints 1:7
- A boolean array
import pandas as pd
import numpy as np
test=pd.DataFrame(np.random.randn(20).reshape(4,5),index=['A','B','C','D'],columns=['E','F','G','H','I'])
test
Out[4]:
E F G H I
A -0.833316 -1.982666 1.055594 0.781759 -0.107631
B -1.514709 -1.422883 0.204399 -0.487639 -1.652785
C -0.424735 0.400529 -0.786582 0.855885 0.059894
D 2.016221 -1.314878 -1.745535 -0.907778 0.834966
#看見了吧,是“左閉右開”區間呀!
test.iloc[0:1,0:1]
Out[10]:
E
A -0.833316
ix
.ix supports mixed integer and label based access. It is primarily label based, but will fall back to integer positional access unless the corresponding axis is of integer type.
ix就是一種集大成者的選擇方法呀!既支持position選擇,也支持label選擇。主要是label選擇。
import pandas as pd
import numpy as np
test=pd.DataFrame(np.random.randn(20).reshape(4,5),index=['A','B','C','D'],columns=['E','F','G','H','I'])
test
Out[4]:
E F G H I
A -0.833316 -1.982666 1.055594 0.781759 -0.107631
B -1.514709 -1.422883 0.204399 -0.487639 -1.652785
C -0.424735 0.400529 -0.786582 0.855885 0.059894
D 2.016221 -1.314878 -1.745535 -0.907778 0.834966
#下面的`ix`是不是和`loc`作用差不多啊~
test.ix['A':'B','E':'F']
Out[12]:
E F
A -0.833316 -1.982666
B -1.514709 -1.422883
#下面的是和`iloc`差不多了
test.ix[0:1,0:1]
Out[11]:
E
A -0.833316
但是需要注意的是,當index或者columns是整數時,ix索引其實是按label選擇的,因此,是閉區間的。
參考
發現還是官方文檔說的最詳細啊!希望以后有機會多看看這里的內容~
