數據為DataFrame格式,如下:
1.對每一行,FirstCab的值為空時,Weight的值乘以0.8
方法一(可行):df.loc[df['FirstCab'].isnull(),'Weight'] *= 0.8
方法二(可行):df['Weight'] = np.where(df['FirstCab'].isnull(),df['Weight']*0.8,df['Weight'])
方法三(不可行):df[df['FirstCab'].isnull()]['Weight'] *= 0.8 或者 df.loc[df['FirstCab'].isnull(),:]['Weight'] *= 0.8
錯誤提示:A value is trying to be set on a copy of a slice from a DataFrame.Try using .loc[row_indexer,col_indexer] = value instead.
方法四(不可行):df.assign(Weight=lambda x: x['Weight']*0.8 if x['FirstCab'].isnull() else x['Weight'])
或者df.assign(Weight=lambda x: x['Weight']*0.8 if x['FirstCab'] == np.nan else x['Weight'])
錯誤提示:ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
2.提取Ticket列末尾的數字,之后需要比較大小
Ticket列的值如:A/5. 2151 或者PP 9549 或者 333223或者LINE,提取末尾的數字,沒有則返回NaN
方法一(可行):df['NumTic']= df['Ticket'].str.extract('(\d{3,8})',expand=False).astype(float)
方法二(不可行): df['NumTic']= df['Ticket'].str.extract('(\d{3,8})',expand=False)
錯誤提示: 保存的是字符型值,在比較大小或加減運算時出錯
方法三(不可行):df['NumTic']= df['Ticket'].str.extract('(\d{3,8})',expand=False).astype(int)
錯誤提示:NaN不能轉換為int型
3.對索引賦值是報錯ValueError: Length mismatch: Expected axis has 678 elements, new values have 677 elements
代碼如下:
attrValues = df['NumTic'].sort_values()
#去重
attrValues = attrValues.drop_duplicates()
#修改index
attrValues.index = range(attrValues.count()) ###出錯在這里
原因:attrValues中有一個NaN值, attrValues.count()不統計NaN值,所以attrValues.index的個數,比attrValues.count()大1
解決辦法:
在首行代碼中去掉df中的NaN值行,即attrValues = df[df['NumTic'].notnull()].NumTic.sort_values()。