Pandas數據查詢

pandas 查詢數據的幾種方法

df.loc方法，根據行，列的標簽值查詢
df.iloc方法，根據行，列的數字位置查詢
df.where方法
df.query方法

.loc即可以查詢，又能覆蓋雪茹，強烈推薦

pandas 使用df.loc查詢數據的方法

使用單個label值查詢數據
使用值列表批量查詢
使用數值區間進行范圍查詢
使用條件表達式查詢
調用函數查詢

注意：

以上查詢方法，即適用於行，也適用於列
注意觀察降維DataFrame>Series>值

1、讀取數據

import pandas as pd

file_path = "../files/beijing_tianqi_2018.csv"
df = pd.read_csv(file_path)

print("打印前幾行的數據:\n ", df.head())

# 設定索引為日期，方便按日期篩選
df.set_index('ymd', inplace=True)

# 時間序列見后續課程，本次按字符串處理
print("打印索引:\n ", df.index)

print("打印前幾行的數據:\n ", df.head())

# 替換溫度的后綴℃
df.loc[:, 'bWendu'] = df.loc[:, 'bWendu'].str.replace('℃', '').astype('int32')
df.loc[:, 'yWendu'] = df.loc[:, 'yWendu'].str.replace('℃', '').astype('int32')

print("打印每列的數據類型:\n ", df.dtypes)

print("打印前幾行的數據:\n ", df.head())

2、使用單個label值查詢數據

行或者列，都可以只傳單個值，實現精確匹配

import pandas as pd

file_path = "../files/beijing_tianqi_2018.csv"
df = pd.read_csv(file_path)
# 數據的預處理
# 設定索引為日期，方便按日期篩選
df.set_index('ymd', inplace=True)
# 替換溫度的后綴℃
df.loc[:, 'bWendu'] = df.loc[:, 'bWendu'].str.replace('℃', '').astype('int32')
df.loc[:, 'yWendu'] = df.loc[:, 'yWendu'].str.replace('℃', '').astype('int32')

# 打印前幾行數據
print(df.head())

# 得到單個值(獲取2018-01-03的最高溫度)
print(df.loc['2018-01-03', 'bWendu'])

# 得到一個Series(獲取2018-01-03的最高溫度和最低溫度)
print(df.loc['2018-01-03', ['bWendu', 'yWendu']])

3、使用值列表批量查詢

import pandas as pd

file_path = "../files/beijing_tianqi_2018.csv"
df = pd.read_csv(file_path)
# 數據的預處理
# 設定索引為日期，方便按日期篩選
df.set_index('ymd', inplace=True)
# 替換溫度的后綴℃
df.loc[:, 'bWendu'] = df.loc[:, 'bWendu'].str.replace('℃', '').astype('int32')
df.loc[:, 'yWendu'] = df.loc[:, 'yWendu'].str.replace('℃', '').astype('int32')

# 打印前幾行數據
print(df.head())

# 得到Series(獲取['2018-01-03', '2018-01-04', '2018-01-05']的最高溫度)
print(df.loc[['2018-01-03', '2018-01-04', '2018-01-05'], 'bWendu'])

# 得到DataFrame(獲取['2018-01-03', '2018-01-04', '2018-01-05']的最高溫度和最低溫度)
print(df.loc[['2018-01-03', '2018-01-04', '2018-01-05'], ['bWendu', 'yWendu']])

4、使用數值區間進行范圍查詢

注意：區間即包含開始，也包含結束

import pandas as pd

file_path = "../files/beijing_tianqi_2018.csv"
df = pd.read_csv(file_path)
# 數據的預處理
# 設定索引為日期，方便按日期篩選
df.set_index('ymd', inplace=True)
# 替換溫度的后綴℃
df.loc[:, 'bWendu'] = df.loc[:, 'bWendu'].str.replace('℃', '').astype('int32')
df.loc[:, 'yWendu'] = df.loc[:, 'yWendu'].str.replace('℃', '').astype('int32')

# 打印前幾行數據
print(df.head(), '\n', '*' * 50)

# 行index按區間
print(df.loc['2018-01-03':'2018-01-05', 'bWendu'], '\n', '*' * 50)

# 列index按區間
print(df.loc['2018-01-03', 'bWendu':'fengxiang'], '\n', '*' * 50)

# 行列都按區間查詢
print(df.loc['2018-01-03':'2018-01-05', 'bWendu':'fengxiang'])

5、使用條件表達式查詢

bool列表的長度等於行數或者列數

簡單條件查詢

import pandas as pd

file_path = "../files/beijing_tianqi_2018.csv"
df = pd.read_csv(file_path)
# 數據的預處理
# 設定索引為日期，方便按日期篩選
df.set_index('ymd', inplace=True)
# 替換溫度的后綴℃
df.loc[:, 'bWendu'] = df.loc[:, 'bWendu'].str.replace('℃', '').astype('int32')
df.loc[:, 'yWendu'] = df.loc[:, 'yWendu'].str.replace('℃', '').astype('int32')

# 簡單查詢, 最低溫度低於-10度的列表
print(df.loc[df['yWendu'] < -10, :], '\n', '*' * 50)

# 觀察一下這里的boolean條件
print(df['yWendu'] < -10, '\n', '*' * 50)

復雜條件查詢

注意：組合條件用&符號合並，每個條件判斷都得帶括號

import pandas as pd

file_path = "../files/beijing_tianqi_2018.csv"
df = pd.read_csv(file_path)
# 數據的預處理
# 設定索引為日期，方便按日期篩選
df.set_index('ymd', inplace=True)
# 替換溫度的后綴℃
df.loc[:, 'bWendu'] = df.loc[:, 'bWendu'].str.replace('℃', '').astype('int32')
df.loc[:, 'yWendu'] = df.loc[:, 'yWendu'].str.replace('℃', '').astype('int32')

# 查詢最高溫度小於30度，並且最低溫度大於十五度，並且是晴天，並且天氣為優的數據
print(df.loc[(df['bWendu'] <= 30) & (df['yWendu'] >= 15) & (df['tianqi'] == '晴') & (df['aqiLevel'] == 1), :])
print('*' * 50)

# 觀察一下這里的boolean條件
print((df['bWendu'] <= 30) & (df['yWendu'] >= 15) & (df['tianqi'] == '晴') & (df['aqiLevel'] == 1))

6、調用函數查詢

import pandas as pd

file_path = "../files/beijing_tianqi_2018.csv"
df = pd.read_csv(file_path)
# 數據的預處理
# 設定索引為日期，方便按日期篩選
df.set_index('ymd', inplace=True)
# 替換溫度的后綴℃
df.loc[:, 'bWendu'] = df.loc[:, 'bWendu'].str.replace('℃', '').astype('int32')
df.loc[:, 'yWendu'] = df.loc[:, 'yWendu'].str.replace('℃', '').astype('int32')

# 直接寫lambda表達式
print(df.loc[lambda df: (df['bWendu'] <= 30) & (df['yWendu'] >= 15), :])
print('*' * 50)

# 編寫自己的函數，查詢9月份，空氣質量好的數據
def query_my_data(df):
    return df.index.str.startswith('2018-09') & df['aqiLevel'] == 1
print(df.loc[query_my_data, :])

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 pandas類似SQL的數據查詢 Pandas系列教程（4）Pandas新增數據列 Pandas系列教程（8）pandas數據排序 Pandas系列教程（1）Pandas數據讀取 Pandas系列教程（1）Pandas數據讀取 pandas數據查詢（數值、列表、區間、條件、函數） pandas數據查找替換 Pandas系列教程（5）Pandas數據統計函數 Pandas系列教程（6）Pandas缺失值處理 Pandas系列教程（10）Pandas的axis參數