1.AFM模型pytorch實現。
$\hat{y}_{AFM}=w_{0} + \sum_{i=1}^{n}w_{i}x_{i}+p^{T}\sum_{i=1}^{n-1}\sum_{j=i+1}^{n}a_{ij}(v_{i}v_{j})x_{i}x_{j}$
$a_{ij}^{'}=h^{T}Relu(W(v_{i}v_{j})x_{i}x_{j}+b)$
$a_{ij}=\frac{exp(a_{ij}^{'})}{\sum_{i,j}exp(a_{ij}^{'})}$
(實際數據使用的是Dataloader,需要設置batch_size等參數。)
設原來的數據有num_fields =3個特征,one-hot編碼過后對應有30維度,嵌入維度設為ebd_size=4。所以嵌入層定義為
ebd_size = 4
ebd = nn.Embedding(30,ebd_size)
自定義一個batch_size的數據
x_ = [[1, 13, 22], [0, 18,29],[2, 13,27], [0, 11,22],[1, 14,26]] #shape=batch_size*num_fields
x_ = Variable(torch.LongTensor([[1, 13, 22], [0, 18,29],[2, 13,27], [0, 11,22],[1, 14,26]]))
得到對應的嵌入向量
x=ebd(x_)
計算交叉特征:
$(v_{i}v_{j})x_{i}x_{j}$
交叉特征數目=num_fields*(num_fields - 1)/2
inner_product的shape為batch_size*交叉特征數目*嵌入維度
num_fields = x.shape[1] row, col = list(), list() for i in range(num_fields - 1): for j in range(i + 1, num_fields): row.append(i), col.append(j) p, q = x[:, row], x[:, col] inner_product = p * q
接下來求得
$Relu(W(v_{i}v_{j})x_{i}x_{j}+b)$
用一個nn.Linear層,在經過一個Relu激活函數可以完成
attention(inner_product))結果的shape為 batch_size*交叉特征*嵌入維度
attention = torch.nn.Linear(ebd_size, ebd_size) print(attention(inner_product)) # batch_size*交叉特征*嵌入維度 attn_scores = F.relu(attention(inner_product)) print("attn_scores", attn_scores) # batch_size*交叉特征*嵌入維度
接下來在經過一個linear得到$a_{ij}^{'}$
$a_{ij}^{'}=h^{T}Relu(W(v_{i}v_{j})x_{i}x_{j}+b)$
projection = torch.nn.Linear(ebd_size, 1) print("projection(attn_scores)", projection(attn_scores)) # batch_size*交叉特征*1
在經過一個softmax得到
$a_{ij}=\frac{exp(a_{ij}^{'})}{\sum_{i,j}exp(a_{ij}^{'})}$
attn_scores = F.softmax(projection(attn_scores), dim=1) print("attn_scores", attn_scores) # batch_size*交叉特征*1
接下來把交叉特征$(v_{i}v_{j})x_{i}x_{j}$與注意力權重$a_{ij}$相乘
print("attn_scores * inner_product", attn_scores * inner_product) # batch_size*交叉特征*嵌入維度 attn_output = torch.sum(attn_scores * inner_product, dim=1) print("attn_output", attn_output) # batch_size*嵌入維度
最后經過一個輸出大小為1的全連接層
fc = torch.nn.Linear(ebd_size, 1) fc_out = fc(attn_output) print("fc_out", fc_out) # batch_size*1
這樣就把$p^{T}\sum_{i=1}^{n-1}\sum_{j=i+1}^{n}a_{ij}(v_{i}v_{j})x_{i}x_{j}$求出來了,前面一階部分使用一個Linear層就可以求得到
參考代碼:
import torch import numpy as np from torch.autograd import Variable import torch.nn.functional as F import torch.nn as nn ebd_size = 4 ebd = nn.Embedding(30,ebd_size) x_ = Variable(torch.LongTensor([[1, 13, 22], [0, 18,29],[2, 13,27], [0, 11,22],[1, 14,26]])) x=ebd(x_) num_fields = x.shape[1] row, col = list(), list() for i in range(num_fields - 1): for j in range(i + 1, num_fields): row.append(i), col.append(j) p, q = x[:, row], x[:, col] inner_product = p * q print("inner_product", inner_product) # batch_size*交叉特征*嵌入維度 attention = torch.nn.Linear(ebd_size, ebd_size) print(attention(inner_product)) # batch_size*交叉特征*嵌入維度 attn_scores = F.relu(attention(inner_product)) print("attn_scores", attn_scores) # batch_size*交叉特征*嵌入維度 projection = torch.nn.Linear(ebd_size, 1) print("projection(attn_scores)", projection(attn_scores)) # batch_size*交叉特征*1 attn_scores = F.softmax(projection(attn_scores), dim=1) print("attn_scores", attn_scores) # batch_size*交叉特征*1 print("attn_scores * inner_product", attn_scores * inner_product) # batch_size*交叉特征*嵌入維度 attn_output = torch.sum(attn_scores * inner_product, dim=1) print("attn_output", attn_output) # batch_size*嵌入維度 fc = torch.nn.Linear(ebd_size, 1) fc_out = fc(attn_output) print("fc_out", fc_out) # batch_size*1 exit()