對於給定的數據集,進行適當的數據清洗
import pandas as pd data = {'Chinese': [66, 95, 93, 90, 80, 80], 'English': [65, 85, 92, 88, 90, 90], 'Math': [None, 98, 96, 77, 90, 90]} df = pd.DataFrame(data, index=['zhangfei', 'guanyu', 'zhaoyun', 'huangzhong', 'dianwei', 'dianwei'], columns=['English', 'Math', 'Chinese']) print('構建的數據:\n',df) #數據清洗 #刪除不必要的行 df = df.drop(index=['guanyu']) print('刪除后的新數據:\n',df) #去重 df = df.drop_duplicates() print('去重后的新數據:\n',df) #更改數據格式 df['Math'].astype('str') #列名重命名 print('檢查哪列存在空值:\n',df.isnull().any()) #重命名 df.rename(columns={'English':'yingyu','Math':'shuxue','Chinese':'yuwen'},inplace=True) print('重命名后的數據:\n',df) df['sum1'] = df['yingyu']+df['shuxue']+df['yuwen'] print('增加一列總成績:\n',df)
結果:
構建的數據: English Math Chinese zhangfei 65 NaN 66 guanyu 85 98.0 95 zhaoyun 92 96.0 93 huangzhong 88 77.0 90 dianwei 90 90.0 80 dianwei 90 90.0 80 刪除后的新數據: English Math Chinese zhangfei 65 NaN 66 zhaoyun 92 96.0 93 huangzhong 88 77.0 90 dianwei 90 90.0 80 dianwei 90 90.0 80 去重后的新數據: English Math Chinese zhangfei 65 NaN 66 zhaoyun 92 96.0 93 huangzhong 88 77.0 90 dianwei 90 90.0 80 檢查哪列存在空值: English False Math True Chinese False dtype: bool 重命名后的數據: yingyu shuxue yuwen zhangfei 65 NaN 66 zhaoyun 92 96.0 93 huangzhong 88 77.0 90 dianwei 90 90.0 80 增加一列總成績: yingyu shuxue yuwen sum1 zhangfei 65 NaN 66 NaN zhaoyun 92 96.0 93 281.0 huangzhong 88 77.0 90 255.0 dianwei 90 90.0 80 260.0