在Pandas DataFrames中选择行和列使用iloc，loc和ix

本文转载自查看原文 2021-02-10 14:35 289

在Pandas中有三种主要的选择来实现选择和索引活动，这可能会造成混淆。这篇文章介绍的三个选择案例和方法是：

资料设定

这篇受其他教程启发的博客文章介绍了这些操作的选择活动。本教程适用于一般的数据科学情况，通常我会发现自己：

数据框中的每一行代表一个数据样本。
每列都是一个变量，通常被命名。我很少选择没有名称的列。
我需要经常从数据框中选择相关的行以进行建模和可视化活动。

对于刚起步的人，Python的Pandas库提供了高性能，易于使用的数据结构和数据分析工具，用于处理“系列”和“数据框”中的表格数据。在使您的数据处理变得更加轻松方面，它非常出色。我之前已经写过关于使用Pandas进行数据分组和汇总的文章。

本博客文章中讨论的iloc和loc方法的摘要。iloc和loc是用于从Pandas数据框中检索数据的操作。

Pandas数据框的选择和索引方法

对于这些探索，我们将需要一些样本数据–我从www.briandunning.com下载了uk-500样本数据集。此数据包含虚构的英国字符的人工名称，地址，公司和电话号码。要继续进行操作，您可以在此处下载.csv文件。加载数据如下（此图来自Jupyter笔记本在蟒蛇Python的安装）：

import pandas as pd
import random
 
# read the data from the downloaded CSV file.
data = pd.read_csv('https://s3-eu-west-1.amazonaws.com/shanebucket/downloads/uk-500.csv')
# set a numeric id for use as an index for examples.设置数字ID用作示例索引。
data['id'] = [random.randint(0,1000) for x in range(data.shape[0])]
 
data.head(5)

从CSV文件加载的示例数据。

1.使用“ iloc”选择Pandas数据

Pandas数据框的iloc索引器用于基于整数位置的索引/按位置选择。

iloc索引器的语法是data.iloc [<行选择>，<列选择>]，对于R用户来说，这肯定会引起混乱。Pandas中的“ iloc”用于按编号选择行和列，顺序是它们出现在数据框中。您可以想象每行的行号从0到总行数（data.shape [0]），而iloc []允许基于这些数字进行选择。列也是如此（范围从0到data.shape [1]）

iloc有两个“参数” –行选择器和列选择器。例如：

# Single selections using iloc and DataFrame使用iloc和DataFrame进行单个选择
# Rows:行
data.iloc[0] # first row of data frame (Aleshia Tomkiewicz) - Note a Series data type output.数据帧的第一行（Aleshia Tomkiewicz）-注意Series数据类型的输出
data.iloc[1] # second row of data frame (Evan Zigomalas)数据帧的第二行（Evan Zigomalas）
data.iloc[-1] # last row of data frame (Mi Richan) 数据帧＃最后一行（祢日婵）
# Columns:列
data.iloc[:,0] # first column of data frame (first_name) 数据帧的第一列（first_name）
data.iloc[:,1] # second column of data frame (last_name) 数据帧的第二列（last_name）
data.iloc[:,-1] # last column of data frame (id) 数据帧的最后一列（id）

可以使用.iloc索引器一起选择多个列和行。

# Multiple row and column selections using iloc and DataFrame 使用iloc和DataFrame选择多个行和列
data.iloc[0:5] # first five rows of dataframe 数据帧的前五行
data.iloc[:, 0:2] # first two columns of data frame with all rows 数据帧的前两列，所有行
data.iloc[[0,3,6,24], [0,5,6]] # 1st, 4th, 7th, 25th row + 1st 6th 7th columns.第一，第四，第七，第25行+第一第六第七列。
data.iloc[0:5, 5:8] # first 5 rows and 5th, 6th, 7th columns of data frame (county -> phone1). 前5行和第五，第六，数据帧的第七列（county- > PHONE1）。

以这种方式使用iloc时，要记住两个陷阱：

请注意，.iloc在选择一行时返回Pandas Series，在选择多行或选择完整列时返回Pandas DataFrame。为了解决这个问题，如果需要DataFrame输出，则传递一个单值列表。

使用.loc或.iloc时，可以通过将列表或单个值传递给选择器来控制输出格式。
当以这种方式选择多列或多行时，请记住在选择中，例如[1：5]，所选行/列将从第一个数字到 一个减去第二个数字。例如[1：5]将变为1,2,3,4。[x，y]从x变为y-1。

实际上，除非我想要数据帧的第一行（.iloc [0]）或最后一行（.iloc [-1]），否则我很少使用iloc索引器。

2.使用“ loc”选择Pandas数据

Pandas loc索引器可与DataFrames一起用于两种不同的用例：

a。）通过标签/索引选择行
b。）选择具有布尔/条件查找的行

位置索引器的使用语法与iloc相同：data.loc [<行选择>，<列选择>]。

2a。使用.loc的基于标签/基于索引的索引

使用loc方法进行的选择基于数据帧的索引（如果有）。使用<code> df.set_index（）</ code>在DataFrame上设置索引的情况下，.loc方法将根据任何行的索引值直接进行选择。例如，将测试数据框的索引设置为人员“ last_name”：

data.set_index("last_name", inplace=True)
data.head()

姓氏设置为样本数据帧上的索引集现在有了索引集，我们可以使用.loc [<label>]直接选择行以使用不同的“ last_name”值-单个或多个。例如：

使用.loc带有Pandas的索引选择来选择单行或多行。请注意，第一个示例返回一个系列，第二个示例返回一个DataFrame。您可以通过将单元素列表传递给.loc操作来实现单列DataFrame。

使用列名选择带有.loc的列。在我的大多数数据工作中，通常我都命名列，并使用这些命名选择。

使用.loc索引器时，使用字符串列表或“：”切片按名称引用列。

您可以选择索引标签的范围–选择</ code> data.loc ['Bruch'：'Julio'] </ code>将返回数据框中“ Bruch”和“ Julio”的索引条目之间的所有行。。现在，以下示例应该有意义：

# Select rows with index values 'Andrade' and 'Veness', with all columns between 'city' and 'email' 选择索引值为“ Andrade”和“ Veness”的行，所有列都在“ city”和“ email”之间
data.loc[['Andrade', 'Veness'], 'city':'email']
# Select same rows, with just 'first_name', 'address' and 'city' columns 选择相同的行，仅包含“ first_name”，“ address”和“ city”列
data.loc['Andrade':'Veness', ['first_name', 'address', 'city']]
 
# Change the index to be based on the 'id' column 将索引更改为基于“ id”列
data.set_index('id', inplace=True)
# select the row with 'id' = 487 选择'id'= 487的行
data.loc[487]

请注意，在最后一个示例中，data.loc [487] （索引值为487的行）不等于data.iloc [487] （数据中的第487行）。DataFrame的索引可以不按数字顺序和/或字符串或多值。

2b。使用.loc的布尔/逻辑索引

使用data.loc [<selection>]与布尔数组进行条件选择是我与Pandas DataFrames一起使用的最常见方法。使用布尔索引或逻辑选择，您可以将数组或True / False值系列传递给.loc索引器，以选择Series具有True值的行。

在大多数使用情况下，您将根据数据集中不同列的值进行选择。

例如，语句data ['first_name'] =='Antonio']生成一个Pandas系列，其“数据” DataFrame中的每一行都具有True / False值，其中first_name所在的行具有“ True”值是“安东尼奥”。这些类型的布尔数组可以直接传递给.loc索引器，如下所示：

使用布尔“真/假”系列选择Pandas数据框中的行-选择所有名称为“ Antonio”的行。

和以前一样，可以将第二个参数传递给.loc以从数据帧中选择特定的列。同样，列是通过loc indexer的名称来引用的，并且可以是单个字符串，列列表或切片“：”操作。

通过将列名传递给.loc []的第二个参数，可以选择带有loc的多列请注意，在选择列时，如果仅选择一列，则.loc运算符将返回一个Series。对于单列DataFrame，请使用一个元素列表来保留DataFrame格式，例如：

如果将单个列的选择作为字符串进行选择，则将从.loc返回一系列。传递列表以返回DataFrame。

为了清楚起见，请确保您了解以下.loc选择的其他示例：

    
# Select rows with first name Antonio, # and all columns between 'city' and 'email' 选择名字为Antonio的行，以及＃在'city'和'email'之间的所有列
data.loc[data['first_name'] == 'Antonio', 'city':'email']
 
# Select rows where the email column ends with 'hotmail.com', include all columns 选择电子邮件列以'hotmail.com'结尾的行，包括所有列
data.loc[data['email'].str.endswith("hotmail.com")]   
 
# Select rows with last_name equal to some values, all columns 选择last_name等于某些值的行，所有列
data.loc[data['first_name'].isin(['France', 'Tyisha', 'Eric'])]   
       
# Select rows with first name Antonio AND hotmail email addresses 选择名字为Antonio和hotmail电子邮件地址的行
data.loc[data['email'].str.endswith("gmail.com") & (data['first_name'] == 'Antonio')] 
 
# select rows with id column between 100 and 200, and just return 'postal' and 'web' columns 选择id列在100到200之间的行，并仅返回“ postal”和“ web”列
data.loc[(data['id'] > 100) & (data['id'] <= 200), ['postal', 'web']] 
 
# A lambda function that yields True/False values can also be used. 也可以使用产生True / False值的lambda函数。
# Select rows where the company name has 4 words in it.
data.loc[data['company_name'].apply(lambda x: len(x.split(' ')) == 4)] 
 
# Selections can be achieved outside of the main .loc for clarity: 为了清楚起见，可以在主.loc之外进行选择：
# Form a separate variable with your selections: 根据您的选择形成一个单独的变量：
idx = data['company_name'].apply(lambda x: len(x.split(' ')) == 4)
# Select only the True values in 'idx' and only the 3 columns specified: 仅选择'idx'中的True值，并且仅指定3列：
data.loc[idx, ['email', 'first_name', 'company']]

逻辑选择和布尔系列也可以传递给pandas DataFrame的通用[]索引器，并给出相同的结果：data.loc [data ['id'] == 9] == data [data ['id'] == 9]。

3.使用ix选择Pandas数据

注意：从0.20.1版开始，ix索引器在最新版本的Pandas中已被弃用。

的IX []索引是的.loc和.iloc的混合体。通常，ix是基于标签的，并且仅用作.loc索引器。但是，.ix还支持传递整数的整数类型选择（如.iloc中一样）。这仅在DataFrame的索引不是基于整数的情况下有效。ix将接受.loc和.iloc的任何输入。

稍微复杂一点，我更喜欢显式使用.iloc和.loc以避免意外的结果。

举个例子：

# ix indexing works just the same as .loc when passed strings ix传递字符串时，索引工作与.loc相同
data.ix[['Andrade']] == data.loc[['Andrade']]
# ix indexing works the same as .iloc when passed integers. 传递整数时，＃ix索引的工作方式与.iloc相同
data.ix[[33]] == data.iloc[[33]]
 
# ix only works in both modes when the index of the DataFrame is NOT an integer itself.ix仅在DataFrame的索引本身不是整数时才能在两种模式下工作。

使用.loc在DataFrames中设置值

稍微改变一下语法，实际上就可以在与.loc索引器选择和过滤的语句相同的语句中更新DataFrame。这种特殊的模式使您可以根据不同的条件更新列中的值。设置操作不会复制数据框，而是编辑原始数据。

举个例子：

# Change the first name of all rows with an ID greater than 2000 to "John" ＃将ID大于2000的所有行的名字更改为“ John”
data.loc[data['id'] > 2000, "first_name"] = "John"

# Change the first name of all rows with an ID greater than 2000 to "John" 将ID大于2000的所有行的名字更改为“ John”
data.loc[data['id'] > 2000, "first_name"] = "John"

免责声明！

本站转载的文章为个人学习借鉴使用，本站对版权不负任何法律责任。如果侵犯了您的隐私权益，请联系本站邮箱yoyou2525@163.com删除。

猜您在找 pandas (loc、iloc、ix)的区别 Pandas：Series、DataFrame数据的loc、iloc、ix 查询 / 读取 Python之Pandas 相关操作02---数据筛选、数据选择、loc、iloc的使用、新增一行、读取某些行 pandas loc和iloc pandas常用操作详解——.loc与.iloc函数的使用及区别 DataFrame的查询方法（loc,iloc,at,iat,ix的用法和区别）根据条件在Pandas DataFrame中选择行 pandas 赋值操作 from loc 或者iloc python数据分析之pandas数据选取：df[] df.loc[] df.iloc[] df.ix[] df.at[] df.iat[] python数据处理相关操作——iloc、loc、ix选取数据