本文轉載自：https://blog.csdn.net/xiaosongshine/article/details/90600028

一、Self-Attention概念詳解

對於self-attention來講，Q(Query), K(Key), V(Value)三個矩陣均來自同一輸入，首先我們要計算Q與K之間的點乘，然后為了防止其結果過大，會除以一個尺度標度其中 d_k 為一個query和key向量的維度。再利用Softmax操作將其結果歸一化為概率分布，然后再乘以矩陣V就得到權重求和的表示。該操作可以表示為

如果將輸入的所有向量合並為矩陣形式，則所有query, key, value向量也可以合並為矩陣形式表示

其中 W^Q, W^K, W^V 是我們模型訓練過程學習到的合適的參數。上述操作即可簡化為矩陣形式

二、Self_Attention模型搭建

筆者使用Keras來實現對於Self_Attention模型的搭建，由於網絡中間參數量比較多，這里采用自定義網絡層的方法構建Self_Attention，關於如何自定義Keras可以參看這里：編寫你自己的 Keras 層

Keras實現自定義網絡層。需要實現以下三個方法:（注意input_shape是包含batch_size項的）

build(input_shape): 這是你定義權重的地方。這個方法必須設 self.built = True，可以通過調用 super([Layer], self).build() 完成。
call(x): 這里是編寫層的功能邏輯的地方。你只需要關注傳入 call 的第一個參數：輸入張量，除非你希望你的層支持masking。
compute_output_shape(input_shape): 如果你的層更改了輸入張量的形狀，你應該在這里定義形狀變化的邏輯，這讓Keras能夠自動推斷各層的形狀

from keras.preprocessing import sequence
from keras.datasets import imdb
from matplotlib import pyplot as plt
import pandas as pd
 
from keras import backend as K
from keras.engine.topology import Layer
 
 
class Self_Attention(Layer):
 
    def __init__(self, output_dim, **kwargs):
        self.output_dim = output_dim
        super(Self_Attention, self).__init__(**kwargs)
 
    def build(self, input_shape):
        # 為該層創建一個可訓練的權重
        #inputs.shape = (batch_size, time_steps, seq_len)
        self.kernel = self.add_weight(name='kernel',
                                      shape=(3,input_shape[2], self.output_dim),
                                      initializer='uniform',
                                      trainable=True)
 
        super(Self_Attention, self).build(input_shape)  # 一定要在最后調用它
 
    def call(self, x):
        WQ = K.dot(x, self.kernel[0])
        WK = K.dot(x, self.kernel[1])
        WV = K.dot(x, self.kernel[2])
 
        print("WQ.shape",WQ.shape)
 
        print("K.permute_dimensions(WK, [0, 2, 1]).shape",K.permute_dimensions(WK, [0, 2, 1]).shape)
 
 
        QK = K.batch_dot(WQ,K.permute_dimensions(WK, [0, 2, 1]))
 
        QK = QK / (64**0.5)
 
        QK = K.softmax(QK)
 
        print("QK.shape",QK.shape)
 
        V = K.batch_dot(QK,WV)
 
        return V
 
    def compute_output_shape(self, input_shape):
 
        return (input_shape[0],input_shape[1],self.output_dim)

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 從attention到self-attention self-attention詳解 Self-attention + transformer 和其他一些總結語義匹配，cnn + self-attention孿生網絡 SAGAN:Self-Attention Generative Adversarial Networks - 1 - 論文學習論文解讀-TransForensics: Image Forgery Localization with Dense Self-Attention 自然語言處理中的自注意力機制（Self-attention Mechanism）【自然語言處理】：自注意力機制(self-attention)原理介紹 Keras實現Hierarchical Attention Network時的一些坑從Seq2seq到Attention模型到Self Attention