Pandas缺失值處理

Pandas使用這些函數處理缺失值：

isnull和notnull: 檢測是否是空值，可用於df和Series
dropna: 丟棄，刪除缺失值
- axis: 刪除行還是列，{0 ro 'index', 1 or 'columns'}， default 0
- how: 如果等於any則任何值為空都刪除，如果等於all則所有值都為空時才刪除
- inplace: 如果為True則修改當前df, 否則返回新的df
fillna: 填充空值
- value: 用於填充的值，可以是單個值，或者字典（key是列名，value是值）
- method: 等於ffill使用前一個部位空的值填充forword fill; 等於bfill使用后一個部位空的值天充backword fill
- axis: 按行還是按列填充，{0 ro 'index', 1 or 'columns'}
- inplace: 如果為True則修改當前df, 否則返回新的df

實例：特殊Excel的讀取，清洗，處理

import pandas as pd

# 第一步：讀取Excel的時候忽略前幾個空行
print('*' * 25, '第一步：讀取Excel的時候忽略前幾個空行', '*' * 25)
file_path = "../datas/student_excel/student_excel.xlsx"
studf = pd.read_excel(file_path, skiprows=2)
print(studf)

# 第二步：檢測空值
print('*' * 25, '第二步：檢測空值', '*' * 25)
print(studf.isnull())
print('*' * 25, '篩選分數為空的值', '*' * 25)
print(studf['分數'].isnull())
print('*' * 25, '篩選分數不為空的值', '*' * 25)
print(studf['分數'].notnull())
print('*' * 25, '篩選沒有空分數的所有行', '*' * 25)
print(studf.loc[studf['分數'].notnull(), :])

# 第三步：刪除全是空值的列
studf.dropna(axis='columns', how='all', inplace=True)
print('*' * 25, '第三步：刪除全是空值的列', '*' * 25)
print(studf)

# 第四步：刪除全是空值的行
studf.dropna(axis='index', how='all', inplace=True)
print('*' * 25, '第四步：刪除全是空值的行', '*' * 25)
print(studf)

# 第五步：將分數列為空的填充為0分
# studf.fillna({"分數": 0})   # 有點小問題
studf.loc[:, '分數'] = studf['分數'].fillna(0)  # 兩種方式相同
print('*' * 25, '第五步：將分數列為空的填充為0分', '*' * 25)
print(studf)

# 第六步：將姓名的缺失值填充
studf.loc[:, '姓名'] = studf['姓名'].fillna(method='ffill')
print('*' * 25, '第六步：將姓名的缺失值填充', '*' * 25)
print(studf)

# 第七步：將清洗好的execel保存
print('*' * 25, '第七步：將清洗好的execel保存', '*' * 25)
studf.to_excel("../datas/student_excel/student_excel_clean.xlsx", index=False)

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Pandas系列（三）-缺失值處理 Pandas對缺失值的處理 pandas缺失值處理 Pandas缺失值處理 pandas-缺失值處理 pandas 缺失值處理，插值 Pandas高級教程之:處理缺失數據 pandas 缺失值、重復值的處理與值的替換 pandas處理缺失值df.dropna( )的thresh參數 Python數據分析（二）pandas缺失值處理