Conclusion:
- Mask 是創造了一個 mask 矩陣,隨着每一層的結果 tensor 一起逐層傳遞,如果之后某一層不能接受 mask 矩陣則會報錯
- Embedding, mask_zero 有效
- Concatenate, Dense 層之前可以有 Masking 層, 雖然從 tensor output 輸出來看似乎 mask 矩陣沒有作用,但是相應 mask 矩陣會繼續向下傳遞,影響后邊的層
- Mask 主要作用於 RNN 層,會忽略掉相應的 timestep,在 tensor output 的表現為:被 mask 的 timestep 結果為 0 或者與之前時間步結果相同
- Concatenate 之前如果 一個輸入矩陣的某個 timestep 被 mask 了,整個輸出矩陣的那個 timestep 都會被 mask
- 不要重復調用 Masking 層,因為會重新定義 mask 矩陣。尤其是在 Embedding 層后 mask 的 timestep 並不為 0,會使 mask_value 不全部匹配
Experimental:
模型部分代碼 (用無序編號代替縮進):
def rnn_model(x_train, y_train):
# Inputs
num = Input(shape=(x_train[0].shape[1], x_train[0].shape[2]))
version = Input(shape=(x_train[1].shape[1], x_train[1].shape[2]))
missing = Input(shape=(x_train[2].shape[1], x_train[2].shape[2]))
inputs = [num, version, missing]
# Embedding for categorical variables
reshape_version = Reshape(target_shape=(-1,))(version)
embedding_version = Embedding(180, 2, input_length=x_train[1].shape[1] * x_train[1].shape[2], mask_zero=True, name='M_version')(reshape_version)
reshape_missing = Reshape(target_shape=(-1,))(missing)
embedding_missing = Embedding(4, 1, input_length=x_train[1].shape[1] * x_train[1].shape[2], mask_zero=True, name='M_missing')(reshape_missing)
num = Masking(mask_value=0, name='M_num')(num)
# # # concatenate layer
merge_ft = concatenate([num, embedding_version, embedding_missing], axis=-1, name='concate')
# GRU with various length
'''
Do not use anymore mask layer, as a new layer will overwrite the mask tensor.
As long as part of the timestep is masked, then the whole timestep is masked and won't be calculated
'''
# merge_ft = Dense(3, name='test')(merge_ft)
gru_1 = GRU(3, return_sequences=True, name='gru_1')(merge_ft)
gru_2 = GRU(3, return_sequences=True, name='gru_2')(gru_1)
gru_3 = GRU(3, name='gru_3')(gru_2)
dense_ft = Dense(2, name='dense_ft')(gru_3)
outputs = Lambda(lambda x: K.tf.nn.softmax(x), name='outputs')(dense_ft)
model = Model(inputs=inputs, outputs=outputs)
adam = Adam(lr=0.0001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=1e-6)
model.compile(loss='categorical_crossentropy', optimizer=adam)
return model
測試部分代碼
if __name__ == '__main__':
# for test mask
# fake num with size 1*5*3
num = [[[0,0,0],[1,2,3],[0,0,0],[1,2,3],[0,0,0]]]
num = np.array(num)
c1 = [[[0],[1],[0],[1],[0]]]
c1 = np.array(c1)
c2 = [[[0],[1],[0],[1],[0]]]
c2 = np.array(c2)
y = [[0, 1]]
y = np.array(y)
x = [num, c1, c2]
model = rnn_model(x, y)
layer_name = 'gru_1'
intermediate_model = Model(inputs = model.input, outputs = model.get_layer(layer_name).output)
print intermediate_model.predict(x)