pandas 數據處理遇到的問題

本文轉載自查看原文 2017-09-11 23:36 9328 Python for data analysis

數據為DataFrame格式，如下：

1.對每一行，FirstCab的值為空時，Weight的值乘以0.8

方法一（可行）：df.loc[df['FirstCab'].isnull(),'Weight'] *= 0.8

方法二（可行）：df['Weight'] = np.where(df['FirstCab'].isnull(),df['Weight']*0.8,df['Weight'])

方法三（不可行）：df[df['FirstCab'].isnull()]['Weight'] *= 0.8 或者 df.loc[df['FirstCab'].isnull(),:]['Weight'] *= 0.8

錯誤提示：A value is trying to be set on a copy of a slice from a DataFrame.Try using .loc[row_indexer,col_indexer] = value instead.

方法四（不可行）：df.assign(Weight=lambda x: x['Weight']*0.8 if x['FirstCab'].isnull() else x['Weight'])

　　　　　　　　或者df.assign(Weight=lambda x: x['Weight']*0.8 if x['FirstCab'] == np.nan else x['Weight'])

錯誤提示：ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

2.提取Ticket列末尾的數字，之后需要比較大小

Ticket列的值如：A/5. 2151 或者PP 9549 或者 333223或者LINE，提取末尾的數字，沒有則返回NaN

方法一（可行）：df['NumTic']= df['Ticket'].str.extract('(\d{3,8})',expand=False).astype(float)

方法二（不可行）： df['NumTic']= df['Ticket'].str.extract('(\d{3,8})',expand=False)

錯誤提示：保存的是字符型值，在比較大小或加減運算時出錯

方法三（不可行）：df['NumTic']= df['Ticket'].str.extract('(\d{3,8})',expand=False).astype(int)

錯誤提示：NaN不能轉換為int型

3.對索引賦值是報錯ValueError: Length mismatch: Expected axis has 678 elements, new values have 677 elements

代碼如下：

attrValues = df['NumTic'].sort_values()
#去重
attrValues = attrValues.drop_duplicates()
#修改index
attrValues.index = range(attrValues.count()) ###出錯在這里

原因：attrValues中有一個NaN值, attrValues.count()不統計NaN值，所以attrValues.index的個數，比attrValues.count()大1

解決辦法：

在首行代碼中去掉df中的NaN值行，即attrValues = df[df['NumTic'].notnull()].NumTic.sort_values()。

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 pandas之數據處理操作 Pandas文本數據處理 Pandas分類（category）數據處理 python數據處理：pandas基礎 Python之Pandas庫學習（三）：數據處理 Pandas 數據處理 | Datetime 在 Pandas 中的一些用法！ pandas | 使用pandas進行數據處理——Series篇 Pandas數據處理+Matplotlib繪圖案例【python】pandas & matplotlib 數據處理繪制曲面圖 python 數據處理學習pandas之DataFrame