刪除重復行的DataFrame

本文轉載自查看原文 2020-10-26 11:08 1656

DataFrame. drop_duplicates （子集= None，keep = 'first'，inplace = False，ignore_index = False）[資源]

返回刪除重復行的DataFrame。

考慮某些列是可選的。包括時間索引在內的索引將被忽略。

參量

子集列標簽或標簽序列，可選: 僅考慮某些列來標識重復項，默認情況下使用所有列。
保留 {'first'，'last'，False}，默認為'first': 確定要保留的重復項（如果有）。- first：除去第一次出現的重復項。- last：除去最后一次出現的重復項。-錯誤：刪除所有重復項。
就地布爾值，默認為False: 是將副本放置在適當位置還是返回副本。
ignore_index bool，默認為False: 如果為True，則結果軸將標記為0、1，…，n-1。

1.0.0版的新功能。

退貨

數據框: 刪除重復項的DataFrame，如果為則為None inplace=True。

也可以看看

DataFrame.value_counts: 計算列的唯一組合。

例子

考慮包含拉面等級的數據集。

 
              >>> df = pd.DataFrame({ ... 'brand': ['Yum Yum', 'Yum Yum', 'Indomie', 'Indomie', 'Indomie'], ... 'style': ['cup', 'cup', 'cup', 'pack', 'pack'], ... 'rating': [4, 4, 3.5, 15, 5] ... }) >>> df  brand style rating 0 Yum Yum cup 4.0 1 Yum Yum cup 4.0 2 Indomie cup 3.5 3 Indomie pack 15.0 4 Indomie pack 5.0  
             

默認情況下，它將基於所有列刪除重復的行。

 
              >>> df.drop_duplicates()  brand style rating 0 Yum Yum cup 4.0 2 Indomie cup 3.5 3 Indomie pack 15.0 4 Indomie pack 5.0  
             

要刪除特定列上的重復項，請使用subset。

 
              >>> df.drop_duplicates(subset=['brand'])  brand style rating 0 Yum Yum cup 4.0 2 Indomie cup 3.5  
             

要刪除重復項並保持最后一次出現，請使用keep。

 
              >>> df.drop_duplicates(subset=['brand', 'style'], keep='last')  brand style rating 1 Yum Yum cup 4.0 2 Indomie cup 3.5 4 Indomie pack 5.0  
             

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 dataframe刪除重復行 spark按某幾列刪除dataframe重復行 df.drop_duplicates()返回刪除重復行（或者列）的DataFrame Python 中使用 pandas Dataframe 刪除重復的行 DataFrame 刪除全為零的行 VIM刪除重復行 SQLServer刪除重復行 python dataframe刪除指定的行 pandas判斷dataframe是否含有重復行 vscode刪除重復行並排序