SimpleRNNCell詳解
一、總結
一句話總結:
units: 正整數,輸出空間的維度,即隱藏層神經元數量.
recurrent_dropout: 隱藏層之間的dropout.
class SimpleRNNCell(Layer): """Cell class for SimpleRNN. # Arguments units: 正整數,輸出空間的維度,即隱藏層神經元數量. activation: 激活函數,默認是tanh use_bias: Boolean, 是否使用偏置向量. kernel_initializer: 輸入和隱藏層之間的權重參數初始化器.默認使用'glorot_uniform' recurrent_initializer: 隱藏層之間的權重參數初始化器.默認使用'orthogonal' bias_initializer: 偏置向量的初始化器. kernel_regularizer: 輸入和隱藏層之間權重參數的正則化方法. recurrent_regularizer: 隱藏層之間權重參數的正則化方法. bias_regularizer: 偏置向量的正則化方法. kernel_constraint: kernel的約束方法. recurrent_constraint: 隱藏層權重的約束函數. bias_constraint: 偏置向量的約束函數. dropout: 輸入和隱藏層之間的dropout. recurrent_dropout: 隱藏層之間的dropout. """
二、SimpleRNNCell詳解
轉自或參考:Keras源碼(1):SimpleRNNCell詳解
http://blog.csdn.net/u013230189/article/details/108208123
1.源碼講解
SimpleRNNCell類可以理解為RNN中的一個時間步的計算,而RNN則是把多個這樣的cell進行串聯起來統一計算。

如上圖所示,紅色小方框就表示一個cell的計算。而外面的紅色大方框則表示整個RNN的計算。
SimpleRNNCell繼承自Layer基類,主要包含4個方法:
- init():構造方法,主要用於初始化參數
- build():主要用於初始化網絡層中涉及到的權重參數
- call():用於網絡層的參數計算,對輸入進行計算,並產生相應地輸出
- get_config():獲取該網絡層的參數配置
具體參數和方法解釋看以下源碼注釋:
class SimpleRNNCell(Layer):
"""Cell class for SimpleRNN. # Arguments units: 正整數,輸出空間的維度,即隱藏層神經元數量. activation: 激活函數,默認是tanh use_bias: Boolean, 是否使用偏置向量. kernel_initializer: 輸入和隱藏層之間的權重參數初始化器.默認使用'glorot_uniform' recurrent_initializer: 隱藏層之間的權重參數初始化器.默認使用'orthogonal' bias_initializer: 偏置向量的初始化器. kernel_regularizer: 輸入和隱藏層之間權重參數的正則化方法. recurrent_regularizer: 隱藏層之間權重參數的正則化方法. bias_regularizer: 偏置向量的正則化方法. kernel_constraint: kernel的約束方法. recurrent_constraint: 隱藏層權重的約束函數. bias_constraint: 偏置向量的約束函數. dropout: 輸入和隱藏層之間的dropout. recurrent_dropout: 隱藏層之間的dropout. """
def __init__(self, units,
activation='tanh',
use_bias=True,
kernel_initializer='glorot_uniform',
recurrent_initializer='orthogonal',
bias_initializer='zeros',
kernel_regularizer=None,
recurrent_regularizer=None,
bias_regularizer=None,
kernel_constraint=None,
recurrent_constraint=None,
bias_constraint=None,
dropout=0.,
recurrent_dropout=0.,
**kwargs):
super(SimpleRNNCell, self).__init__(**kwargs)
self.units = units
self.activation = activations.get(activation)
self.use_bias = use_bias
self.kernel_initializer = initializers.get(kernel_initializer)
self.recurrent_initializer = initializers.get(recurrent_initializer)
self.bias_initializer = initializers.get(bias_initializer)
self.kernel_regularizer = regularizers.get(kernel_regularizer)
self.recurrent_regularizer = regularizers.get(recurrent_regularizer)
self.bias_regularizer = regularizers.get(bias_regularizer)
self.kernel_constraint = constraints.get(kernel_constraint)
self.recurrent_constraint = constraints.get(recurrent_constraint)
self.bias_constraint = constraints.get(bias_constraint)
self.dropout = min(1., max(0., dropout)) # dropout 在[0,1]之間
self.recurrent_dropout = min(1., max(0., recurrent_dropout))
self.state_size = self.units
self.output_size = self.units
self._dropout_mask = None
self._recurrent_dropout_mask = None
def build(self, input_shape):
# build方法主要是用於構建權重。
# 在call()函數第一次執行時會被調用一次,這時候可以知道輸入數據的shape,會初始化權重參數
# 輸入和隱藏層之間的權重,add_weight方法會初始化權重參數,該方法有個參數trainable默認為True
# 表示權重參數會隨着訓練更新
self.kernel = self.add_weight(shape=(input_shape[-1], self.units),
name='kernel',
initializer=self.kernel_initializer,
regularizer=self.kernel_regularizer,
constraint=self.kernel_constraint)
# 不同時間步隱藏層之間的權重
self.recurrent_kernel = self.add_weight(
shape=(self.units, self.units),
name='recurrent_kernel',
initializer=self.recurrent_initializer,
regularizer=self.recurrent_regularizer,
constraint=self.recurrent_constraint)
# 是否使用偏置向量
if self.use_bias:
self.bias = self.add_weight(shape=(self.units,),
name='bias',
initializer=self.bias_initializer,
regularizer=self.bias_regularizer,
constraint=self.bias_constraint)
else:
self.bias = None
# build函數會在__call__之前被調用一次,但是如果已經調用過了那么就不會被調用,
# 看是否被調用的標志是self.built是否為True,
# 如果是True, 那么下一次__call__的時候就不會調用,所以我們調用官方的layers的時候是不需要額外的build的。
self.built = True
# 神經網絡的前向傳播過程,在這里進行計算
# inputs表示輸入的單個時間步的張量,states表示前一時間步的hidden state,list類型
def call(self, inputs, states, training=None):
prev_output = states[0]
if 0 < self.dropout < 1 and self._dropout_mask is None:
# 生成一個dropout_mask張量,用於對輸入inputs進行dropout
self._dropout_mask = _generate_dropout_mask(
K.ones_like(inputs),
self.dropout,
training=training)
if (0 < self.recurrent_dropout < 1 and
self._recurrent_dropout_mask is None):
self._recurrent_dropout_mask = _generate_dropout_mask(
K.ones_like(prev_output),
self.recurrent_dropout,
training=training)
dp_mask = self._dropout_mask
rec_dp_mask = self._recurrent_dropout_mask
# 先對輸入inputs進行dropout,然后在與權重參數kernel進行dot
if dp_mask is not None:
h = K.dot(inputs * dp_mask, self.kernel)
else:
h = K.dot(inputs, self.kernel)
# 如果有偏置向量,則加上偏置
if self.bias is not None:
h = K.bias_add(h, self.bias)
# 隱藏層之間的計算,是否需要dropout
if rec_dp_mask is not None:
prev_output *= rec_dp_mask
output = h + K.dot(prev_output, self.recurrent_kernel)
# 是否進行激活
if self.activation is not None:
output = self.activation(output)
# Properly set learning phase on output tensor.
if 0 < self.dropout + self.recurrent_dropout:
if training is None:
output._uses_learning_phase = True
return output, [output]
# 獲取參數配置
def get_config(self):
config = {'units': self.units,
'activation': activations.serialize(self.activation),
'use_bias': self.use_bias,
'kernel_initializer':
initializers.serialize(self.kernel_initializer),
'recurrent_initializer':
initializers.serialize(self.recurrent_initializer),
'bias_initializer': initializers.serialize(self.bias_initializer),
'kernel_regularizer':
regularizers.serialize(self.kernel_regularizer),
'recurrent_regularizer':
regularizers.serialize(self.recurrent_regularizer),
'bias_regularizer': regularizers.serialize(self.bias_regularizer),
'kernel_constraint': constraints.serialize(self.kernel_constraint),
'recurrent_constraint':
constraints.serialize(self.recurrent_constraint),
'bias_constraint': constraints.serialize(self.bias_constraint),
'dropout': self.dropout,
'recurrent_dropout': self.recurrent_dropout}
base_config = super(SimpleRNNCell, self).get_config()
return dict(list(base_config.items()) + list(config.items()))
其中build方法中使用的add_weight()方法繼承自父類Layer,如果網絡層有參數需要參與訓練,都需要在這里定義。
def add_weight(self,
name,
shape,
dtype=None,
initializer=None,
regularizer=None,
trainable=True,
constraint=None):
"""Adds a weight variable to the layer. # Arguments name: String, 權重變量的名稱. shape: 權重的shape. dtype: 權重的數據類型. initializer: 權重的初始化方法. regularizer: 權重的正則化方法. trainable: 權重是否可更新. constraint: 可選的約束方法. # Returns 返回權重張量. """
initializer = initializers.get(initializer)
if dtype is None:
dtype = self.dtype
weight = K.variable(initializer(shape, dtype=dtype),
dtype=dtype,
name=name,
constraint=constraint)
if regularizer is not None:
with K.name_scope('weight_regularizer'):
self.add_loss(regularizer(weight))
if trainable:
self._trainable_weights.append(weight)
else:
self._non_trainable_weights.append(weight)
return weight
2. 使用實例
import tensorflow as tf
import keras
# (batch_size,time_step,embedding_dim)
batch_size = 10
time_step = 20
embedding_dim = 100
train_x = tf.random.normal(shape=[batch_size,time_step,embedding_dim])
hidden_dim = 64 # 隱藏層維度
h0 = tf.random.normal(shape=[batch_size,hidden_dim])
x0 = train_x[:,0,:] # 第一個時間步的輸入
simpleRnnCell = keras.layers.recurrent.SimpleRNNCell(hidden_dim)
out,h1=simpleRnnCell(x0, [h0]) # 將當前時間步的x和上一時間步的隱藏層輸出輸入到
print(out.shape,h1[0].shape)
