在進行數據分析時,經常需要按照一定的條件創建新的數據列,然后進行進一步分析
-
直接復制
-
-
df.assign方法
-
按照條件選擇分組分別賦值
import pandas as pd file_path = "../files/beijing_tianqi_2018.csv" df = pd.read_csv(file_path) print(df.head())
# 設定索引為日期,方便按日期篩選 df.set_index('ymd', inplace=True) # 替換溫度的后綴℃ df.loc[:, 'bWendu'] = df.loc[:, 'bWendu'].str.replace('℃', '').astype('int32') df.loc[:, 'yWendu'] = df.loc[:, 'yWendu'].str.replace('℃', '').astype('int32')
實例:計算溫度差
# 注意df['bWendu']其實是一個Series,后面的減法返回的是Series df.loc[:, 'wencha'] = df['bWendu'] - df['yWendu']
完整代碼:
import pandas as pd file_path = "../files/beijing_tianqi_2018.csv" df = pd.read_csv(file_path) # 替換溫度的后綴℃, 並轉為int32(修改列) df.loc[:, 'bWendu'] = df.loc[:, 'bWendu'].str.replace('℃', '').astype('int32') df.loc[:, 'yWendu'] = df.loc[:, 'yWendu'].str.replace('℃', '').astype('int32') print(df.head()) print('*' * 50, '\n') # 計算溫度差(新增列) # 注意df['bWendu']其實是一個Series,后面的減法返回的是Series df.loc[:, 'wencha'] = df['bWendu'] - df['yWendu'] print(df.head())
實例:添加一列溫度類型
-
如果溫度大於33度就是高溫
-
低於-10度就是低溫
-
否則是常溫
import pandas as pd file_path = "../files/beijing_tianqi_2018.csv" df = pd.read_csv(file_path) # 替換溫度的后綴℃, 並轉為int32(修改列) df.loc[:, 'bWendu'] = df.loc[:, 'bWendu'].str.replace('℃', '').astype('int32') df.loc[:, 'yWendu'] = df.loc[:, 'yWendu'].str.replace('℃', '').astype('int32') print(df.head()) print('*' * 50, '\n') def get_wendu_type(x): if x['bWendu'] > 33: return "高溫" elif x['yWendu'] < -10: return "低溫" else: return "常溫" # 注意需要設置axis--1,這時Series的index是columns df.loc[:, 'wendu_type'] = df.apply(get_wendu_type, axis=1) # 打印前幾行數據 print(df.head()) print('*' * 50, '\n') # 查看溫度類型的計數 print(df['wendu_type'].value_counts())
import pandas as pd file_path = "../files/beijing_tianqi_2018.csv" df = pd.read_csv(file_path) # 替換溫度的后綴℃, 並轉為int32(修改列) df.loc[:, 'bWendu'] = df.loc[:, 'bWendu'].str.replace('℃', '').astype('int32') df.loc[:, 'yWendu'] = df.loc[:, 'yWendu'].str.replace('℃', '').astype('int32') print(df.head()) print('*' * 50, '\n') df_huashi = df.assign( yWendu_huashi=lambda x: x['yWendu'] * 9 / 5 + 32, bWendu_huashi=lambda x: x['bWendu'] * 9 / 5 + 32 ) print(df_huashi.head()) print('*' * 50, '\n')
按條件先選擇數據,然后對着部分數據賦值新列
實例:高低溫差大於10度,則認為溫差較大
import pandas as pd file_path = "../files/beijing_tianqi_2018.csv" df = pd.read_csv(file_path) # 替換溫度的后綴℃, 並轉為int32(修改列) df.loc[:, 'bWendu'] = df.loc[:, 'bWendu'].str.replace('℃', '').astype('int32') df.loc[:, 'yWendu'] = df.loc[:, 'yWendu'].str.replace('℃', '').astype('int32') # 打印前幾行數據 print(df.head()) print('*' * 50, '\n') # 先創建空列(這是第一種創建新列的方法) df['wencha_type'] = "" df.loc[df['bWendu'] - df['yWendu'] > 10, 'wencha_type'] = "溫差大" df.loc[df['bWendu'] - df['yWendu'] <= 10, 'wencha_type'] = "溫差正常" # 打印前幾行數據 print(df.head()) print('*' * 50, '\n') # 查看溫差類型的計數 print(df['wencha_type'].value_counts())