Pandas-數據選取

本文轉載自查看原文 2016-10-11 11:53 1687

Pandas包對數據的常用數據切片功能

[]
where 布爾查找
isin
query
loc
iloc
ix
map與lambda
contains

DataFrame的索引選取

[]
- 只能對行進 行（row/index） 切片，前閉后開
```
df[0:3]
df[:4]
df[4:]
```
where 布爾查找
- 在[]基礎上的運用　　
```
df[df["A"]>7]
```

isin

比where更為靈活

# 返回布爾值
s.isin([1,2,3])
df["A"].isin([1,2,3])

df.loc[df['sepal_length'].isin([5.8,5.1])]

query
- 多個where整合切片，&：於，|：或　　
```
df.query(" A>5.0 & (B>3.5 | C<1.0) ")
```

loc ：根據名稱Label切片

切名稱

# df.loc[A,B] A是行范圍，B是列范圍
df.loc[1:4,['petal_length','petal_width']]

創建新變量

# 需求1：創建一個新的變量 test
# 如果sepal_length > 3 test = 1 否則 test = 0
df.loc[df['sepal_length'] > 6, 'test'] = 1
df.loc[df['sepal_length'] <=6, 'test'] = 0

# 需求2：創建一個新變量test2 
# 1.petal_length>2 and petal_width>0.3 = 1 
# 2.sepeal_length>6 and sepal_width>3 = 2 3.其他 = 0
df['test2'] = 0
df.loc[(df['petal_length']>2)&(df['petal_width']>0.3), 'test2'] = 1
df.loc[(df['sepal_length']>6)&(df['sepal_width']>3), 'test2'] = 2

iloc：切位置
- 切位置，以序列號去切
```
df.iloc[1:4,:]
```
ix：混切
- 名稱和位置混切，但效率低，少用
```
df1.ix[0:3,['sepal_length','petal_width']]
```

map與lambda

alist = [1,2,3,4]
map(lambda s : s+1, alist)

[2, 3, 4, 5]

df['sepal_length'].map(lambda s:s*2+1)[0:3]

0    11.2
1    10.8
2    10.4
Name: sepal_length, dtype: float64

contains

# 使用DataFrame模糊篩選數據(類似SQL中的LIKE)
# 使用正則表達式進行模糊匹配,*匹配0或無限次,?匹配0或1次
df_obj[df_obj['套餐'].str.contains(r'.*?語音CDMA.*')] 

# 下面兩句效果一致
df[df['商品名稱'].str.contains("四件套")]
df[df['商品名稱'].str.contains(r".*四件套.*")]

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Pandas-數據整理 pandas-數據類型轉換 pandas 數據索引與選取 pandas-數據的合並與拼接 Pandas DataFrame 數據選取和過濾 pandas-賦值操作 Pandas-多表操作 pandas-索引 pandas-缺失值處理 Pandas- 隨機抽樣