CTR學習筆記&代碼實現2-深度ctr模型 MLP->Wide&Deep


背景

這一篇我們從基礎的深度ctr模型談起。我很喜歡Wide&Deep的框架感覺之后很多改進都可以納入這個框架中。Wide負責樣本中出現的頻繁項挖掘,Deep負責樣本中未出現的特征泛化。而后續的改進要么用不同的IFC讓Deep更有效的提取特征交互信息,要么是讓Wide更好的記憶樣本信息

以下代碼針對Dense輸入感覺更容易理解模型結構,其他針對spare輸入的模型和完整代碼 👇
https://github.com/DSXiangLi/CTR

Embedding + MLP

點擊率模型最初在深度學習上的嘗試是從簡單的MLP開始的。把高維稀疏的離散特征做Embedding處理,然后把Embedding拼接作為MLP的輸入,經過多層全聯接神經網絡的非線性變換得到對點擊率的預測。

不知道你是否也像我一樣困惑過,這個Embedding+MLP究竟學到了什么信息?MLP的Embedding和FM的Embedding學到的是同樣的特征交互信息么?最近從大神那里聽到一個蠻有說服力的觀點,當然keep skeptical,歡迎一起討論~

mlp可以學到所有特征低階和高階的信息表達,但依賴龐大的搜索空間。在樣本有限,參數也有限的情況下往往只能學到有限的信息。因此才依賴於基於業務理解的特征工程來幫助mlp在有限的空間下學到更多有效的特征交互信息。FM的向量內積只是二階特征工程的一種方法。之后針對deep的很多改進也是在探索如何把特征工程的業務經驗用於更好的提取特征交互信息

代碼實現

def build_features(numeric_handle):
    f_sparse = []
    f_dense = []

    for col, config in EMB_CONFIGS.items():
        ind = tf.feature_column.categorical_column_with_hash_bucket(col, hash_bucket_size = config['hash_size'])
        one_hot = tf.feature_column.indicator_column(ind)
        f_sparse.append(one_hot)

    # Method1 for numeric feature
    if numeric_handle == 'bucketize':
        # Method1 'onehot': bucket to one hot
        for col, config in BUCKET_CONFIGS.items():
            num = tf.feature_column.numeric_column( col )
            bucket = tf.feature_column.bucketized_column( num, boundaries=config )
            f_sparse.append(bucket)
    else :
        # Method2 'dense': concatenate with embedding
        for col, config in BUCKET_CONFIGS.items():
            num = tf.feature_column.numeric_column( col )
            f_dense.append(num)
    return f_sparse, f_dense

@tf_estimator_model
def model_fn(features, labels, mode, params):
    sparse_columns, dense_columns = build_features(params['numeric_handle'])

    with tf.variable_scope('EmbeddingInput'):
        embedding_input = []
        for f_sparse in sparse_columns:
            sparse_input = tf.feature_column.input_layer(features, f_sparse)

            input_dim = sparse_input.get_shape().as_list()[-1]

            init = tf.random_normal(shape = [input_dim, params['embedding_dim']])

            weight = tf.get_variable('w_{}'.format(f_sparse.name), dtype = tf.float32, initializer = init)

            embedding_input.append( tf.matmul(sparse_input, weight) )

        dense = tf.concat(embedding_input, axis=1, name = 'embedding_concat')

        # if treat numeric feature as dense feature, then concatenate with embedding. else concatenate wtih sparse input
        if params['numeric_handle'] == 'dense':
            numeric_input = tf.feature_column.input_layer(features, dense_columns)

            numeric_input = tf.layers.batch_normalization(numeric_input, center = True, scale = True, trainable =True,
                                                          training = (mode == tf.estimator.ModeKeys.TRAIN))
            dense = tf.concat([dense, numeric_input], axis = 1, name ='numeric_concat')

    with tf.variable_scope('MLP'):
        for i, unit in enumerate(params['hidden_units']):
            dense = tf.layers.dense(dense, units = unit, activation = 'relu', name = 'Dense_{}'.format(i))
            if mode == tf.estimator.ModeKeys.TRAIN:
                dense = tf.layers.dropout(dense, rate = params['dropout_rate'], training = (mode==tf.estimator.ModeKeys.TRAIN))

    with tf.variable_scope('output'):
        y = tf.layers.dense(dense, units=1, name = 'output')

    return y

Wide&Deep

Wide&Deep是在上述MLP的基礎上加入了Wide部分。作者認為Deep的部分負責generalization既樣本中未出現模式的泛化和模糊查詢,就是上面的Embedding+MLP。wide負責memorization既樣本中已有模式的記憶,是對離散特征和特征組合做Logistics Regression。Deep和Wide一起進行聯合訓練。

這樣說可能不完全准確,作者在文中也提到wide部分只是用來錦上添花,來幫助Deep增加那些在樣本中頻繁出現的模式在預測目標上的區分度。所以wide不需要是一個full-size模型,而更多需要業務上判斷比較核心的特征和交叉特征。

連續特征的處理

ctr模型大多是在探討稀疏離散特征的處理,那連續特征應該怎么處理呢?有幾種處理方式

  1. 連續特征離散化處理,之后可以做embedding/onehot/cross
  2. 連續特征不做處理,直接和其他離散特征embedding后的vector拼接作為輸入。這里要考慮對連續特征進行歸一化處理, 不然會收斂的很慢。上面MLP嘗試了BatchNorm,Wide&Deep則直接在feature_column里面做了歸一化。
  3. 既作為連續特征輸入,同時也做離散化和其他離散特征進行交互

連續特征離散化的優缺點
缺點

  1. 信息丟失,丟失多少信息要看桶分的咋樣
  2. 平滑度下降,處於分桶邊界的特征變動可能帶來預測值比較大的波動

優點

  1. 加入非線性,多數情況下連續特征和目標之間都不是線性關系,而是在到達某個閾值對用戶存在0/1的影響
  2. 更穩健,可有效避免連續特征中的極值/長尾問題
  3. 特征交互,做離散特征處理后方便進一步做cross特征
  4. 省事...,不需要再考慮啥正不正態要不要做歸一化之類的

代碼實現

def znorm(mean, std):
    def znorm_helper(col):
        return (col-mean)/std
    return znorm_helper

def build_features():
    f_onehot = []
    f_embedding = []
    f_numeric = []

    # categorical features
    for col, config in EMB_CONFIGS.items():
        ind = tf.feature_column.categorical_column_with_hash_bucket(col, hash_bucket_size = config['hash_size'])
        f_onehot.append( tf.feature_column.indicator_column(ind))
        f_embedding.append( tf.feature_column.embedding_column(ind, dimension = config['emb_size']) )

    # numeric features: both in numeric feature and bucketized to discrete feature
    for col, config in BUCKET_CONFIGS.items():
        num = tf.feature_column.numeric_column(col,
                                               normalizer_fn = znorm(NORM_CONFIGS[col]['mean'],NORM_CONFIGS[col]['std'] ))
        f_numeric.append(num)
        bucket = tf.feature_column.bucketized_column( num, boundaries=config )
        f_onehot.append(bucket)

    # crossed features
    for col1,col2 in combinations(f_onehot,2):
        # if col is indicator of hashed bucuket, use raw feature directly
        if col1.parents[0].name in EMB_CONFIGS.keys():
            col1 = col1.parents[0].name
        if col2.parents[0].name in EMB_CONFIGS.keys():
            col2 = col2.parents[0].name

        crossed = tf.feature_column.crossed_column([col1, col2], hash_bucket_size = 20)
        f_onehot.append(tf.feature_column.indicator_column(crossed))

    f_dense = f_embedding + f_numeric    #f_dense = f_embedding + f_numeric + f_onehot
    f_sparse = f_onehot     #f_sparse = f_onehot + f_numeric

    return f_sparse, f_dense
    
def build_estimator(model_dir):
    sparse_feature, dense_feature= build_features()

    run_config = tf.estimator.RunConfig(
        save_summary_steps=50,
        log_step_count_steps=50,
        keep_checkpoint_max = 3,
        save_checkpoints_steps =50
    )

    dnn_optimizer = tf.train.ProximalAdagradOptimizer(
                    learning_rate= 0.001,
                    l1_regularization_strength=0.001,
                    l2_regularization_strength=0.001
    )

    estimator = tf.estimator.DNNLinearCombinedClassifier(
        model_dir=model_dir,
        linear_feature_columns=sparse_feature,
        dnn_feature_columns=dense_feature,
        dnn_optimizer = dnn_optimizer,
        dnn_dropout = 0.1,
        batch_norm = False,
        dnn_hidden_units = [48,32,16],
        config=run_config )

    return estimator

CTR學習筆記&代碼實現系列👇

https://github.com/DSXiangLi/CTR

CTR學習筆記&代碼實現1-深度學習的前奏LR->FFM

參考材料

  1. Weinan Zhang, Tianming Du, and Jun Wang. Deep learning over multi-field categorical data - - A case study on user response prediction. In ECIR, 2016.
  2. Cheng H T, Koc L, Harmsen J, et al. Wide & deep learning for recommender systems[C]//Proceedings of the 1st Workshop on Deep Learning for Recommender Systems. ACM, 2016: 7-10
  3. https://www.jiqizhixin.com/articles/2018-07-16-17
  4. https://cloud.tencent.com/developer/article/1063010
  5. https://github.com/shenweichen/DeepCTR


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM