Keras實現Hierarchical Attention Network時的一些坑


Reshape

對於的張量x,x.shape=(a, b, c, d)的情況

若調用keras.layer.Reshape(target_shape=(-1, c, d)),
處理后的張量形狀為(?, ?, c, d)

若調用tf.reshape(x, shape=[-1, c, d])
處理后的張量形狀為(a*b, c, d)

為了在keras代碼中實現tf.reshape的效果,用lambda層做,
調用Lambda(lambda x: tf.reshape(x, shape=[-1, c, d]))(x)

nice and cool.

輸出Attention的打分

這里,我們希望attention層能夠輸出attention的score,而不只是計算weighted sum。
在使用時
score = Attention()(x)
weighted_sum = MyMerge()([score, x])

class Attention(Layer):
    def __init__(self, **kwargs):
        super(Attention, self).__init__(**kwargs)

    def build(self, input_shape):
        assert len(input_shape) == 3
        self.w = self.add_weight(name="attention_weight",
                                   shape=(input_shape[-1],
                                          input_shape[-1]),
                                   initializer='uniform',
                                   trainable=True
                                   )

        self.b = self.add_weight(name="attention_bias",
                                   shape=(input_shape[-1],),
                                   initializer='uniform',
                                   trainable=True
                                   )

        self.v = self.add_weight(name="attention_v",
                                 shape=(input_shape[-1], 1),
                                 initializer='uniform',
                                 trainable=True
                                 )
        super(Attention, self).build(input_shape)

    def call(self, inputs):
        x = inputs
        att = K.tanh(K.dot(x, self.w) + self.b)
        att = K.softmax(K.dot(att, self.v))
        print(att.shape)
        return att

    def compute_output_shape(self, input_shape):
        return input_shape[0], input_shape[1], 1

class MyMerge(Layer):
    def __init__(self, **kwargs):
        super(MyMerge, self).__init__(**kwargs)

    def call(self, inputs):
        att = inputs[0]
        x = inputs[1]
        att = tf.tile(att, [1, 1, x.shape[-1]])
        outputs = tf.multiply(att, x)
        outputs = K.sum(outputs, axis=1)
        return outputs

    def compute_output_shape(self, input_shape):
        return input_shape[1][0], input_shape[1][2]

keras中Model的嵌套

這邊是轉載自https://github.com/uhauha2929/examples/blob/master/Hierarchical%20Attention%20Networks%20.ipynb
可以看到,sentEncoder是Model類型,在后面的時候通過TimeDistributed(sentEncoder),當成一個層那樣被調用。

embedding_layer = Embedding(len(word_index) + 1,
                            EMBEDDING_DIM,
                            input_length=MAX_SENT_LENGTH)

sentence_input = Input(shape=(MAX_SENT_LENGTH,), dtype='int32')
embedded_sequences = embedding_layer(sentence_input)
l_lstm = Bidirectional(LSTM(100))(embedded_sequences)
sentEncoder = Model(sentence_input, l_lstm)

review_input = Input(shape=(MAX_SENTS,MAX_SENT_LENGTH), dtype='int32')
review_encoder = TimeDistributed(sentEncoder)(review_input)
l_lstm_sent = Bidirectional(LSTM(100))(review_encoder)
preds = Dense(2, activation='softmax')(l_lstm_sent)
model = Model(review_input, preds)


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM