在Pandas DataFrames中選擇行和列使用iloc，loc和ix

本文轉載自查看原文 2021-02-10 14:35 289

在Pandas中有三種主要的選擇來實現選擇和索引活動，這可能會造成混淆。這篇文章介紹的三個選擇案例和方法是：

資料設定

這篇受其他教程啟發的博客文章介紹了這些操作的選擇活動。本教程適用於一般的數據科學情況，通常我會發現自己：

數據框中的每一行代表一個數據樣本。
每列都是一個變量，通常被命名。我很少選擇沒有名稱的列。
我需要經常從數據框中選擇相關的行以進行建模和可視化活動。

對於剛起步的人，Python的Pandas庫提供了高性能，易於使用的數據結構和數據分析工具，用於處理“系列”和“數據框”中的表格數據。在使您的數據處理變得更加輕松方面，它非常出色。我之前已經寫過關於使用Pandas進行數據分組和匯總的文章。

本博客文章中討論的iloc和loc方法的摘要。iloc和loc是用於從Pandas數據框中檢索數據的操作。

Pandas數據框的選擇和索引方法

對於這些探索，我們將需要一些樣本數據–我從www.briandunning.com下載了uk-500樣本數據集。此數據包含虛構的英國字符的人工名稱，地址，公司和電話號碼。要繼續進行操作，您可以在此處下載.csv文件。加載數據如下（此圖來自Jupyter筆記本在蟒蛇Python的安裝）：

import pandas as pd
import random
 
# read the data from the downloaded CSV file.
data = pd.read_csv('https://s3-eu-west-1.amazonaws.com/shanebucket/downloads/uk-500.csv')
# set a numeric id for use as an index for examples.設置數字ID用作示例索引。
data['id'] = [random.randint(0,1000) for x in range(data.shape[0])]
 
data.head(5)

從CSV文件加載的示例數據。

1.使用“ iloc”選擇Pandas數據

Pandas數據框的iloc索引器用於基於整數位置的索引/按位置選擇。

iloc索引器的語法是data.iloc [<行選擇>，<列選擇>]，對於R用戶來說，這肯定會引起混亂。Pandas中的“ iloc”用於按編號選擇行和列，順序是它們出現在數據框中。您可以想象每行的行號從0到總行數（data.shape [0]），而iloc []允許基於這些數字進行選擇。列也是如此（范圍從0到data.shape [1]）

iloc有兩個“參數” –行選擇器和列選擇器。例如：

# Single selections using iloc and DataFrame使用iloc和DataFrame進行單個選擇
# Rows:行
data.iloc[0] # first row of data frame (Aleshia Tomkiewicz) - Note a Series data type output.數據幀的第一行（Aleshia Tomkiewicz）-注意Series數據類型的輸出
data.iloc[1] # second row of data frame (Evan Zigomalas)數據幀的第二行（Evan Zigomalas）
data.iloc[-1] # last row of data frame (Mi Richan) 數據幀＃最后一行（禰日嬋）
# Columns:列
data.iloc[:,0] # first column of data frame (first_name) 數據幀的第一列（first_name）
data.iloc[:,1] # second column of data frame (last_name) 數據幀的第二列（last_name）
data.iloc[:,-1] # last column of data frame (id) 數據幀的最后一列（id）

可以使用.iloc索引器一起選擇多個列和行。

# Multiple row and column selections using iloc and DataFrame 使用iloc和DataFrame選擇多個行和列
data.iloc[0:5] # first five rows of dataframe 數據幀的前五行
data.iloc[:, 0:2] # first two columns of data frame with all rows 數據幀的前兩列，所有行
data.iloc[[0,3,6,24], [0,5,6]] # 1st, 4th, 7th, 25th row + 1st 6th 7th columns.第一，第四，第七，第25行+第一第六第七列。
data.iloc[0:5, 5:8] # first 5 rows and 5th, 6th, 7th columns of data frame (county -> phone1). 前5行和第五，第六，數據幀的第七列（county- > PHONE1）。

以這種方式使用iloc時，要記住兩個陷阱：

請注意，.iloc在選擇一行時返回Pandas Series，在選擇多行或選擇完整列時返回Pandas DataFrame。為了解決這個問題，如果需要DataFrame輸出，則傳遞一個單值列表。

使用.loc或.iloc時，可以通過將列表或單個值傳遞給選擇器來控制輸出格式。
當以這種方式選擇多列或多行時，請記住在選擇中，例如[1：5]，所選行/列將從第一個數字到 一個減去第二個數字。例如[1：5]將變為1,2,3,4。[x，y]從x變為y-1。

實際上，除非我想要數據幀的第一行（.iloc [0]）或最后一行（.iloc [-1]），否則我很少使用iloc索引器。

2.使用“ loc”選擇Pandas數據

Pandas loc索引器可與DataFrames一起用於兩種不同的用例：

a。）通過標簽/索引選擇行
b。）選擇具有布爾/條件查找的行

位置索引器的使用語法與iloc相同：data.loc [<行選擇>，<列選擇>]。

2a。使用.loc的基於標簽/基於索引的索引

使用loc方法進行的選擇基於數據幀的索引（如果有）。使用<code> df.set_index（）</ code>在DataFrame上設置索引的情況下，.loc方法將根據任何行的索引值直接進行選擇。例如，將測試數據框的索引設置為人員“ last_name”：

data.set_index("last_name", inplace=True)
data.head()

姓氏設置為樣本數據幀上的索引集現在有了索引集，我們可以使用.loc [<label>]直接選擇行以使用不同的“ last_name”值-單個或多個。例如：

使用.loc帶有Pandas的索引選擇來選擇單行或多行。請注意，第一個示例返回一個系列，第二個示例返回一個DataFrame。您可以通過將單元素列表傳遞給.loc操作來實現單列DataFrame。

使用列名選擇帶有.loc的列。在我的大多數數據工作中，通常我都命名列，並使用這些命名選擇。

使用.loc索引器時，使用字符串列表或“：”切片按名稱引用列。

您可以選擇索引標簽的范圍–選擇</ code> data.loc ['Bruch'：'Julio'] </ code>將返回數據框中“ Bruch”和“ Julio”的索引條目之間的所有行。。現在，以下示例應該有意義：

# Select rows with index values 'Andrade' and 'Veness', with all columns between 'city' and 'email' 選擇索引值為“ Andrade”和“ Veness”的行，所有列都在“ city”和“ email”之間
data.loc[['Andrade', 'Veness'], 'city':'email']
# Select same rows, with just 'first_name', 'address' and 'city' columns 選擇相同的行，僅包含“ first_name”，“ address”和“ city”列
data.loc['Andrade':'Veness', ['first_name', 'address', 'city']]
 
# Change the index to be based on the 'id' column 將索引更改為基於“ id”列
data.set_index('id', inplace=True)
# select the row with 'id' = 487 選擇'id'= 487的行
data.loc[487]

請注意，在最后一個示例中，data.loc [487] （索引值為487的行）不等於data.iloc [487] （數據中的第487行）。DataFrame的索引可以不按數字順序和/或字符串或多值。

2b。使用.loc的布爾/邏輯索引

使用data.loc [<selection>]與布爾數組進行條件選擇是我與Pandas DataFrames一起使用的最常見方法。使用布爾索引或邏輯選擇，您可以將數組或True / False值系列傳遞給.loc索引器，以選擇Series具有True值的行。

在大多數使用情況下，您將根據數據集中不同列的值進行選擇。

例如，語句data ['first_name'] =='Antonio']生成一個Pandas系列，其“數據” DataFrame中的每一行都具有True / False值，其中first_name所在的行具有“ True”值是“安東尼奧”。這些類型的布爾數組可以直接傳遞給.loc索引器，如下所示：

使用布爾“真/假”系列選擇Pandas數據框中的行-選擇所有名稱為“ Antonio”的行。

和以前一樣，可以將第二個參數傳遞給.loc以從數據幀中選擇特定的列。同樣，列是通過loc indexer的名稱來引用的，並且可以是單個字符串，列列表或切片“：”操作。

通過將列名傳遞給.loc []的第二個參數，可以選擇帶有loc的多列請注意，在選擇列時，如果僅選擇一列，則.loc運算符將返回一個Series。對於單列DataFrame，請使用一個元素列表來保留DataFrame格式，例如：

如果將單個列的選擇作為字符串進行選擇，則將從.loc返回一系列。傳遞列表以返回DataFrame。

為了清楚起見，請確保您了解以下.loc選擇的其他示例：

    
# Select rows with first name Antonio, # and all columns between 'city' and 'email' 選擇名字為Antonio的行，以及＃在'city'和'email'之間的所有列
data.loc[data['first_name'] == 'Antonio', 'city':'email']
 
# Select rows where the email column ends with 'hotmail.com', include all columns 選擇電子郵件列以'hotmail.com'結尾的行，包括所有列
data.loc[data['email'].str.endswith("hotmail.com")]   
 
# Select rows with last_name equal to some values, all columns 選擇last_name等於某些值的行，所有列
data.loc[data['first_name'].isin(['France', 'Tyisha', 'Eric'])]   
       
# Select rows with first name Antonio AND hotmail email addresses 選擇名字為Antonio和hotmail電子郵件地址的行
data.loc[data['email'].str.endswith("gmail.com") & (data['first_name'] == 'Antonio')] 
 
# select rows with id column between 100 and 200, and just return 'postal' and 'web' columns 選擇id列在100到200之間的行，並僅返回“ postal”和“ web”列
data.loc[(data['id'] > 100) & (data['id'] <= 200), ['postal', 'web']] 
 
# A lambda function that yields True/False values can also be used. 也可以使用產生True / False值的lambda函數。
# Select rows where the company name has 4 words in it.
data.loc[data['company_name'].apply(lambda x: len(x.split(' ')) == 4)] 
 
# Selections can be achieved outside of the main .loc for clarity: 為了清楚起見，可以在主.loc之外進行選擇：
# Form a separate variable with your selections: 根據您的選擇形成一個單獨的變量：
idx = data['company_name'].apply(lambda x: len(x.split(' ')) == 4)
# Select only the True values in 'idx' and only the 3 columns specified: 僅選擇'idx'中的True值，並且僅指定3列：
data.loc[idx, ['email', 'first_name', 'company']]

邏輯選擇和布爾系列也可以傳遞給pandas DataFrame的通用[]索引器，並給出相同的結果：data.loc [data ['id'] == 9] == data [data ['id'] == 9]。

3.使用ix選擇Pandas數據

注意：從0.20.1版開始，ix索引器在最新版本的Pandas中已被棄用。

的IX []索引是的.loc和.iloc的混合體。通常，ix是基於標簽的，並且僅用作.loc索引器。但是，.ix還支持傳遞整數的整數類型選擇（如.iloc中一樣）。這僅在DataFrame的索引不是基於整數的情況下有效。ix將接受.loc和.iloc的任何輸入。

稍微復雜一點，我更喜歡顯式使用.iloc和.loc以避免意外的結果。

舉個例子：

# ix indexing works just the same as .loc when passed strings ix傳遞字符串時，索引工作與.loc相同
data.ix[['Andrade']] == data.loc[['Andrade']]
# ix indexing works the same as .iloc when passed integers. 傳遞整數時，＃ix索引的工作方式與.iloc相同
data.ix[[33]] == data.iloc[[33]]
 
# ix only works in both modes when the index of the DataFrame is NOT an integer itself.ix僅在DataFrame的索引本身不是整數時才能在兩種模式下工作。

使用.loc在DataFrames中設置值

稍微改變一下語法，實際上就可以在與.loc索引器選擇和過濾的語句相同的語句中更新DataFrame。這種特殊的模式使您可以根據不同的條件更新列中的值。設置操作不會復制數據框，而是編輯原始數據。

舉個例子：

# Change the first name of all rows with an ID greater than 2000 to "John" ＃將ID大於2000的所有行的名字更改為“ John”
data.loc[data['id'] > 2000, "first_name"] = "John"

# Change the first name of all rows with an ID greater than 2000 to "John" 將ID大於2000的所有行的名字更改為“ John”
data.loc[data['id'] > 2000, "first_name"] = "John"

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 pandas之loc iloc ix pandas (loc、iloc、ix)的區別 pandas (loc、iloc、ix)的區別 python pandas （ix & iloc &loc）的區別 pandas索引操作之loc，iloc，ix等方法 pandas中df.ix, df.loc, df.iloc 的使用場景以及區別 Pandas：Series、DataFrame數據的loc、iloc、ix 查詢 / 讀取 Python之Pandas 相關操作02---數據篩選、數據選擇、loc、iloc的使用、新增一行、讀取某些行 pandas loc和iloc Pandas---3.下標存取([]/loc/iloc/ix/at/iat/query方法/多級索引/整數label)