o2o優惠券使用預測

本文轉載自查看原文 2018-06-12 19:28 2218 算法

前沿：

這是天池的一個新人實戰塞題目，原址 https://tianchi.aliyun.com/getStart/information.htm?spm=5176.100067.5678.2.e1321db7ydQmSB&raceId=231593 ，下文會分析以下幾個過程。

1.數據預處理

2.特征的選取

3.算法的說明

4.結果分析

5.其他

第一部分：數據預處理

原始數據可以從上邊鏈接中下載，拿到.csv文件，可以使用pandas處理。

比如：

dfoff = pd.read_csv('ccf_offline_stage1_train.csv', keep_default_na=False)

參數 keep_default_na默認為True，當為True時，文件中的'null'則讀物Nan, 此時不能使用 dfoff['Date'] != 'null' 判斷，為了對‘null’可以使用 “==”，“！=”，此處設置 keep_default_na=False 。

我們需要得出優惠券與購買的關聯數據，以此得出Label。

有以下4中組合：

　　有優惠券，購買商品條數
　　無優惠券，購買商品條數
　　有優惠券，不購買商品條數
　　無優惠券，不購買商品條數

代碼如下：

print('有優惠券，購買商品條數', dfoff[(dfoff['Date_received'] != 'null') & (dfoff['Date'] != 'null')].shape[0])
print('無優惠券，購買商品條數', dfoff[(dfoff['Date_received'] == 'null') & (dfoff['Date'] != 'null')].shape[0])
print('有優惠券，不購買商品條數', dfoff[(dfoff['Date_received'] != 'null') & (dfoff['Date'] == 'null')].shape[0])
print('無優惠券，不購買商品條數', dfoff[(dfoff['Date_received'] == 'null') & (dfoff['Date'] == 'null')].shape[0])

　　文件中有買多少減多少，需要格式化為折扣率，距離門店格式化為數字等

def convertRate(row):
    if row == 'null':
        return 1.0
    elif ':' in row:
        rows = row.split(':')
        return 1.0 - float(rows[1])/float(rows[0])
    else:
        return float(row)

def getDiscountMan(row):
    if ':' in row:
        rows = row.split(':')
        return int(rows[0])
    else:
        return 0

def getDiscountJian(row):
    if ':' in row:
        rows = row.split(':')
        return int(rows[1])
    else:
        return 0

def getWeekday(row):
    if row == 'null':
        return row
    else:
        return date(int(row[0:4]), int(row[4:6]), int(row[6:8])).weekday() + 1


def processData(df):
    df['discount_rate'] = df['Discount_rate'].apply(convertRate)
    df['discount_man'] = df['Discount_rate'].apply(getDiscountMan)
    df['discount_jian'] = df['Discount_rate'].apply(getDiscountJian)
    df['discount_type'] = df['Discount_rate'].apply(getDiscountType)
    print(df['discount_rate'].unique())

    df['distance'] = df['Distance'].replace('null', -1).astype(int)
    return df

　　調用 dfoff = processData(dfoff) 即可格式化以上信息。

注意代碼中apply()函數，apply()函數是pandas里面所有函數中自由度最高的函數。該函數如下：

DataFrame.apply(func, axis=0, broadcast=False, raw=False, reduce=None, args=(), **kwds)

對收到優惠券日期處理：

date_received = dfoff['Date_received'].unique()  #.unique()刪除重復項
date_received = sorted(date_received[date_received != 'null']  #排序
print('優惠券收到日期從',date_received[0],'到', date_received[-1])  #輸出最小日期和最大日期

同樣對於消費日期處理：

date_buy = dfoff['Date'].unique()
date_buy = sorted(date_buy[date_buy != 'null'])
date_buy = sorted(dfoff[dfoff['Date'] != 'null']['Date'])
print('消費日期從', date_buy[0], '到', date_buy[-1])

將發放的優惠券與被使用的優惠券畫圖：

couponbydate = dfoff[dfoff['Date_received'] != 'null'][['Date_received', 'Date']].groupby(['Date_received'], as_index=False).count()
couponbydate.columns = ['Date_received','count']
buybydate = dfoff[(dfoff['Date'] != 'null') & (dfoff['Date_received'] != 'null')][['Date_received', 'Date']].groupby(['Date_received'], as_index=False).count()
buybydate.columns = ['Date_received','count']

sns.set_style('ticks')
sns.set_context("notebook", font_scale= 1.4)
plt.figure(figsize = (12,8))
date_received_dt = pd.to_datetime(date_received, format='%Y%m%d')

plt.subplot(211)
plt.bar(date_received_dt, couponbydate['count'], label = 'number of coupon received' )
plt.bar(date_received_dt, buybydate['count'], label = 'number of coupon used')
plt.yscale('log')
plt.ylabel('Count')
plt.legend()

plt.subplot(212)
plt.bar(date_received_dt, buybydate['count']/couponbydate['count'])
plt.ylabel('Ratio(coupon used/coupon received)')
plt.tight_layout()
plt.show()

　　得到一幅圖：

第二部分：特征的選取

第三部分：算法的說明

第四部分：結果分析

第五部分：其他

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 天池新人賽-天池新人實戰賽o2o優惠券使用預測（一）阿里雲天池新人賽o2o優惠券使用預測-------進階心得數據挖掘實戰 - 天池新人賽o2o優惠券使用預測 O2O優惠券預測——對第一名的思路源碼分析優惠券如何測試？邏輯回歸 | 使用 sklearn.linear_model.LogisticRegression 預測不同職業的人優惠券使用情況優惠券設計及流程軟購聯盟優惠券及使用說明生成優惠券，並將優惠券存入Mysql 如何測試（一）優惠券如何測試？