import pandas as pd
1.duplicated 保留重復值
源碼默認標記重復的第一個為不重復第,duplicated(keep='first')
# duplicated 標記重復值,若想第一次出現和最后一次出現不標記那么在參數keep填充相應的參數,如果想標記全部出現的重復值,那么keep=False
animals = pd.Series(['lama', 'cow', 'lama', 'beetle', 'lama'])
animals1 = animals.duplicated(keep='first')
print(animals1)
animals2 = animals.duplicated(keep='last')
print(animals2)
animals3 = animals.duplicated(keep= False)
print(animals3)
2.
drop_duplicates 去除重復值
源碼默認保留第一個,可用inplace 直接修改數據源drop_duplicates(keep='first', inplace=False)
# drop_duplicates 去除重復值,若想保留第一次出現或者保留最后一次出現,那么在參數keep填充相應的參數
animals_d1 = animals.drop_duplicates(keep='first')
print(animals_d1)
animals_d2 = animals.drop_duplicates(keep='last')
print(animals_d2)