第二十一節，使用TensorFlow實現LSTM和GRU網絡

本文轉載自查看原文 2018-05-12 23:29 26356 tensorflow

本節主要介紹在TensorFlow中實現LSTM以及GRU網絡。

一 LSTM網絡

Long Short Term 網絡—— 一般就叫做 LSTM ——是一種 RNN 特殊的類型，可以學習長期依賴信息。LSTM 由 Hochreiter & Schmidhuber (1997) 提出，並在近期被 Alex Graves 進行了改良和推廣。在很多問題，LSTM 都取得相當巨大的成功，並得到了廣泛的使用。

LSTM 通過刻意的設計來避免長期依賴問題。記住長期的信息在實踐中是 LSTM 的默認行為，而非需要付出很大代價才能獲得的能力！

LSTM的結構如下：

這種結構的核心思想是引入了一個叫做細胞狀態的連接，這個細胞狀態用來存放想要記憶的東西。同時在里面加入了三個門：

忘記門;顧名思義，是控制是否遺忘的，在LSTM中即以一定的概率控制是否遺忘上一層的隱藏細胞狀態。
輸入門:輸入門（input gate）負責處理當前序列位置的輸入.
輸出門：決定什么時候需要把狀態和輸出放在一起輸出。

二 LSTM 的變體

上面我們介紹了正常的 LSTM。但是不是所有的 LSTM 都長成一個樣子的。實際上，幾乎所有包含 LSTM 的論文都采用了微小的變體。差異非常小，但是也值得拿出來講一下。

1.窺視孔連接(peephole )

其中一個流形的 LSTM 變體，就是由 Gers & Schmidhuber (2000) 提出的，增加了 “peephole connection”。是說，我們讓每個門也會接受細胞狀態的輸入。

這里寫圖片描述

上面的圖例中，我們增加了 peephole 到每個門上，但是許多論文會加入部分的 peephole 而非所有都加。

2.coupled 忘記門和輸入門

另一個變體是通過使用 coupled 忘記和輸入門。不同於之前是分開確定什么忘記和需要添加什么新的信息，這里是一同做出決定。我們僅僅會當我們將要輸入在當前位置時忘記。我們僅僅輸入新的值到那些我們已經忘記舊的信息的那些狀態。

這里寫圖片描述

3.GRU

另一個改動較大的變體是 Gated Recurrent Unit (GRU)，這是由 Cho, et al. (2014) 提出。它將忘記門和輸入門合成了一個單一的更新門。同樣還混合了細胞狀態和隱藏狀態，和其他一些改動。最終的模型比標准的 LSTM 模型要簡單，也是非常流行的變體。由於GRU比LSTM少了一個狀態輸出，效果幾乎一樣，因此在編碼使用時使用GRU可以讓代碼更為簡單一些。

這里寫圖片描述

這里只是部分流行的 LSTM 變體。當然還有很多其他的，如 Yao, et al. (2015) 提出的 Depth Gated RNN。還有用一些完全不同的觀點來解決長期依賴的問題，如 Koutnik, et al. (2014) 提出的 Clockwork RNN。

要問哪個變體是最好的？其中的差異性真的重要嗎？ Greff, et al. (2015) 給出了流行變體的比較，結論是他們基本上是一樣的。 Jozefowicz, et al. (2015) 則在超過 1 萬中 RNN 架構上進行了測試，發現一些架構在某些任務上也取得了比 LSTM 更好的結果。

三 Bi-RNN網絡介紹

Bi-RNN又叫雙向RNN，是采用了兩個方向的RNN網絡。

RNN網絡擅長的是對於連續數據的處理，既然是連續的數據規律，我們不僅可以學習他的正向規律，還可以學習他的反向規律。這樣正向和反向結合的網絡，回比單向循環網絡有更高的擬合度。

雙向RNN的處理過程和單向RNN非常相似，就是在正向傳播的基礎上再進行一次反向傳播，而且這兩個都連接這一個輸出層。

四 TensorFlow中cell庫

TensorFlow中定義了5個關於cell的類，cell我們可以理解為DNN中的一個隱藏層，只不過是一個比較特殊的層。如下

1.BasicRNNCell類

最基本的RNN類實現:

  def __init__(self, num_units, activation=None, reuse=None)

num_units：LSTM網絡單元的個數，也即隱藏層的節點數。
activation： Nonlinearity to use. Default: `tanh`.
reuse：(optional) Python boolean describing whether to reuse variables in an existing scope. If not `True`, and the existing scope already has the given variables, an error is raised.

2.BasicLSTMCell類

LSTM網絡:

def __init__(self, num_units, forget_bias=1.0, state_is_tuple=True, activation=None, reuse=None):

num_units：LSTM網絡單元的個數，也即隱藏層的節點數。
forget_bias：添加到忘記門的偏置。
state_is_tuple：由於細胞狀態ct和輸出ht是分開的，當為True時放在一個tuple中，(c=array([[]]),h=array([[]]))，當為False時兩個值就按列連接起來，成為[batch,2n]，建議使用True。
activation: Activation function of the inner states. Default: `tanh`.
reuse: (optional) Python boolean describing whether to reuse variables in an existing scope. If not `True`, and the existing scope already has the given variables, an error is raised. 在一個scope中是否重用。

3.LSTMCell類

LSTM實現的一個高級版本。

def __init__(self, num_units, use_peepholes=False, cell_clip=None, initializer=None, num_proj=None, proj_clip=None, num_unit_shards=None, num_proj_shards=None, forget_bias=1.0, state_is_tuple=True, activation=None, reuse=None):

num_units：LSTM網絡單元的個數，也即隱藏層的節點數。
use_peepholes：默認False，True表示啟用Peephole連接。
cell_clip：是否在輸出前對cell狀態按照給定值進行截斷處理。
initializer: (optional) The initializer to use for the weight and projection matrices.
num_proj: (optional) int, The output dimensionality for the projection matrices. If None, no projection is performed.通過projection層進行模型壓縮的輸出維度。
proj_clip: (optional) A float value. If `num_proj > 0` and `proj_clip` is provided, then the projected values are clipped elementwise to within `[-proj_clip, proj_clip]`.將num_proj按照給定的proj_clip截斷。
num_unit_shards: Deprecated, will be removed by Jan. 2017. Use a variable_scope partitioner instead.
num_proj_shards: Deprecated, will be removed by Jan. 2017. Use a variable_scope partitioner instead.
forget_bias: Biases of the forget gate are initialized by default to 1 in order to reduce the scale of forgetting at the beginning of the training.
state_is_tuple: If True, accepted and returned states are 2-tuples of the `c_state` and `m_state`. If False, they are concatenated along the column axis. This latter behavior will soon be deprecated.
activation: Activation function of the inner states. Default: `tanh`.
reuse: (optional) Python boolean describing whether to reuse variables in an existing scope. If not `True`, and the existing scope already has the given variables, an error is raised.

4.GRU類

  def __init__(self, num_units, activation=None, reuse=None, kernel_initializer=None, bias_initializer=None):

num_units：GRU網絡單元的個數，也即隱藏層的節點數。
activation： Nonlinearity to use. Default: `tanh`.
reuse：(optional) Python boolean describing whether to reuse variables in an existing scope. If not `True`, and the existing scope already has the given variables, an error is raise

5.MultiRNNCell

多層RNN的實現：

def __init__(self, cells, state_is_tuple=True)

cells: list of RNNCells that will be composed in this order. 一個cell列表。將列表中的cell一個個堆疊起來，如果使用cells=[cell1,cell2]，就是一共有2層，數據經過cell1后還要經過cells。
state_is_tuple: If True, accepted and returned states are n-tuples, where `n = len(cells)`. If False, the states are all concatenated along the column axis. This latter behavior will soon be deprecated.如果是True則返回的是n-tuple，即cell的輸出值與cell的輸出狀態組成了一個元組。其中輸出值和輸出狀態的結構均為[batch,num_units]。

五通過cell類構建RNN

定義好cell類之后，還需要將它們連接起來構成RNN網絡，TensorFlow中有幾種現成的構建網絡模式，是封裝好的函數，直接調用即可：

1.靜態RNN構建

def tf.contrib.rnn.static_rnn(cell, inputs, initial_state=None, dtype=None, sequence_length=None, scope=None):

cell:生成好的cell類對象。
inputs:A length T list of inputs, each a `Tensor` of shape `[batch_size, input_size]`, or a nested tuple of such elements.輸入數據，由張量組成的list。list的順序就是時間順序。元素就是每個序列的值，形狀為[batch_size,input_size]。
initial_state: (optional) An initial state for the RNN. If `cell.state_size` is an integer, this must be a `Tensor` of appropriate type and shape `[batch_size, cell.state_size]`. If `cell.state_size` is a tuple, this should be a tuple of tensors having shapes `[batch_size, s] for s in cell.state_size`.初始化cell狀態。
dtype: (optional) The data type for the initial state and expected output. Required if initial_state is not provided or RNN state has a heterogeneous。期望輸出和初始化state的類型。
sequence_length: Specifies the length of each sequence in inputs. An int32 or int64 vector (tensor) size `[batch_size]`, values in `[0, T)`.每一個輸入的序列長度。
scope: VariableScope for the created subgraph; defaults to "rnn".命名空間

返回值有兩個，一個是輸出結果，一個是cell狀態。我們只關注結果，結果也是一個list，輸入是多少個時序，list里面就會有多少個元素。每個元素大小為[batch_size,num_units]。

注意：在輸入時，一定要將我們習慣使用的張量改為由張量組成的list。另外，在得到輸出時也要去最后一個時序的輸出參與后面的運算。

2.動態RNN構建

def tf.nn.dynamic_rnn(cell, inputs, sequence_length=None, 
　　　  initial_state=None, dtype=None, parallel_iterations=None, 
       swap_memory=False, time_major=False, scope=None):

cell:生成好的cell類對象。
inputs：If `time_major == False` (default), this must be a `Tensor` of shape:`[batch_size, max_time, ...]`, or a nested tuple of such elements. If `time_major == True`, this must be a `Tensor` of shape: `[max_time, batch_size, ...]`, or a nested tuple of such elements. 輸入數據，是一個張量，默認是三維張量，[batch_size,max_time,...]，batch_size表示一個批次數量，max_time:表示時間序列總數，后面是一個時序輸入數據的長度。
sequence_length: Specifies the length of each sequence in inputs. An int32 or int64 vector (tensor) size `[batch_size]`, values in `[0, T)`.每一個輸入的序列長度。
initial_state: (optional) An initial state for the RNN.If `cell.state_size` is an integer, this must be a `Tensor` of appropriate type and shape `[batch_size, cell.state_size]`. If `cell.state_size` is a tuple, this should be a tuple of tensors having shapes `[batch_size, s] for s in cell.state_size`.初始化cell狀態。
dtype：期望輸出和初始化state的類型。
parallel_iterations: (Default: 32). The number of iterations to run inparallel. Those operations which do not have any temporal dependency and can be run in parallel, will be. This parameter trades off time for space. Values >> 1 use more memory but take less time, while smaller values use less memory but computations take longer.
swap_memory: Transparently swap the tensors produced in forward inference but needed for back prop from GPU to CPU. This allows training RNNs which would typically not fit on a single GPU, with very minimal (or no) performance penalty.
time_major: The shape format of the `inputs` and `outputs` Tensors. If true, these `Tensors` must be shaped `[max_time, batch_size, depth]`. If false, these `Tensors` must be shaped `[batch_size, max_time, depth]`. Using `time_major = True` is a bit more efficient because it avoids transposes at the beginning and end of the RNN calculation. However, most TensorFlow data is batch-major, so by default this function accepts input and emits output in batch-major form.
scope: VariableScope for the created subgraph; defaults to "rnn".命名空間。

返回值：一個是結果，一個是cell狀態。

A pair (outputs, state) where:

outputs: The RNN output `Tensor`.

If time_major == False (default), this will be a `Tensor` shaped: `[batch_size, max_time, cell.output_size]`.

If time_major == True, this will be a `Tensor` shaped: `[max_time, batch_size, cell.output_size]`.

Note, if `cell.output_size` is a (possibly nested) tuple of integers or `TensorShape` objects, then `outputs` will be a tuple having the same structure as `cell.output_size`, containing Tensors having shapes
corresponding to the shape data in `cell.output_size`.

state: The final state. If `cell.state_size` is an int, this will be shaped `[batch_size, cell.state_size]`. If it is a `TensorShape`, this will be shaped `[batch_size] + cell.state_size`. If it is a (possibly nested) tuple of ints or `TensorShape`, this will be a tuple having the corresponding shapes.

由於time_major默認值是False，所以結果是以[batch_size,max_time,...]形式的張量。

注意：在輸出時如果是以[batch_size,max_time,...]形式，即批次優先的矩陣，因為我們需要取最后一個時序的輸出，所以需要轉置成時間優先的形式。

outputs = tf.transpose(outputs,[1,0,2])

3.雙向RNN構建

雙向RNN作為一個可以學習正反規律的循環神經網絡，在TensorFlow中有4個函數可以使用。

1.靜態單層雙向RNN

def tf.contrib.rnn.static_bidirectional_rnn(cell_fw, cell_bw, inputs, initial_state_fw=None, initial_state_bw=None, dtype=None, sequence_length=None, scope=None):

cell_fw: An instance of RNNCell, to be used for forward direction.這個參數是實例化后的cell對象，代表前向。
cell_bw: An instance of RNNCell, to be used for backward direction.這個參數是實例化后的cell對象，代表后向。
inputs: A length T list of inputs, each a tensor of shape [batch_size, input_size], or a nested tuple of such elements.一個長度為t的輸入列表，每一個元素都是一個張量，形狀為[batch_size,input_size],t表示時序總數。
initial_state_fw: (optional) An initial state for the forward RNN. This must be a tensor of appropriate type and shape `[batch_size, cell_fw.state_size]`. If `cell_fw.state_size` is a tuple, this should be a tuple of
tensors having shapes `[batch_size, s] for s in cell_fw.state_size`.前向的細胞狀態初始化，默認為0.
initial_state_bw: (optional) Same as for `initial_state_fw`, but using the corresponding properties of `cell_bw`.后向的細胞狀態初始化，默認為0.
dtype: (optional) The data type for the initial state. Required if either of the initial states are not provided.可以為自定義cell初始狀態指定類型。
sequence_length: (optional) An int32/int64 vector, size `[batch_size]`, containing the actual lengths for each of the sequences.傳入的序列長度
scope: VariableScope for the created subgraph; defaults to "bidirectional_rnn"。名稱空間。

返回值是一個tuple(outputs,outputs_state_fw,output_state_bw)。outputs為一個長度為t的list，每一個元素都包含正向和反向的輸出(即合並之后的，因此不需要使用tf.concat進行連接了)。

2.靜態多層雙向RNN

def tf.contrib.rnn.stack_bidirectional_rnn(cells_fw, cells_bw, inputs, initial_states_fw=None, initial_states_bw=None, dtype=None, sequence_length=None, scope=None):

cells_fw: List of instances of RNNCell, one per layer, to be used for forward direction.實例化后的cell列表，代表正向。
cells_bw: List of instances of RNNCell, one per layer,to be used for backward direction.實例化后的cell列表，代表反向。
inputs: A length T list of inputs, each a tensor of shape [batch_size, input_size], or a nested tuple of such elements.一個長度為t的輸入列表，每一個元素都是一個張量，形狀為[batch_size,input_size]，t表示時序總數。
initial_states_fw: (optional) A list of the initial states (one per layer) for the forward RNN. Each tensor must has an appropriate type and shape `[batch_size, cell_fw.state_size]`.前向細胞狀態初始化，默認為0.
initial_states_bw: (optional) Same as for `initial_states_fw`, but using the corresponding properties of `cells_bw`.后向的細胞狀態初始化，默認為0.
dtype: (optional) The data type for the initial state. Required if either of the initial states are not provided.可以為自定義cell初始狀態指定類型。
sequence_length: (optional) An int32/int64 vector, size `[batch_size]`, containing the actual lengths for each of the sequences.傳入的序列長度
scope: VariableScope for the created subgraph; defaults to None.名稱空間。

3.動態單層雙向RNN

def tf.nn.bidirectional_dynamic_rnn(cell_fw, cell_bw, inputs, sequence_length=None, initial_state_fw=None, initial_state_bw=None, dtype=None, parallel_iterations=None, swap_memory=False, time_major=False, scope=None):

cell_fw: An instance of RNNCell, to be used for forward direction. 這個參數是實例化后的cell對象，代表前向。
cell_bw: An instance of RNNCell, to be used for backward direction.這個參數是實例化后的cell對象，代表后向。
inputs: The RNN inputs. If time_major == False (default), this must be a tensor of shape: `[batch_size, max_time, ...]`, or a nested tuple of such elements. If time_major == True, this must be a tensor of shape:`[max_time, batch_size, ...]`, or a nested tuple of such elements.輸入數據，是一個張量，默認是三維張量，[batch_size,max_time,...]，batch_size表示一個批次數量，max_time:表示時間序列總數，后面是一個時序輸入數據的長度。
sequence_length: (optional) An int32/int64 vector, size `[batch_size]`, containing the actual lengths for each of the sequences in the batch. If not provided, all batch entries are assumed to be full sequences; and time reversal is applied from time `0` to `max_time` for each sequence.序列長度
initial_state_fw: (optional) An initial state for the forward RNN. This must be a tensor of appropriate type and shape `[batch_size, cell_fw.state_size]`. If `cell_fw.state_size` is a tuple, this should be a tuple of tensors having shapes `[batch_size, s] for s in cell_fw.state_size`.前向細胞狀態初始化，默認為0.
initial_state_bw: (optional) Same as for `initial_state_fw`, but using the corresponding properties of `cell_bw`.后向細胞狀態初始化，默認為0.
dtype: (optional) The data type for the initial states and expected output. Required if initial_states are not provided or RNN states have a heterogeneous dtype.數據類型。
parallel_iterations: (Default: 32). The number of iterations to run in parallel. Those operations which do not have any temporal dependency and can be run in parallel, will be. This parameter trades off time for space. Values >> 1 use more memory but take less time, while smaller values use less memory but computations take longer.
swap_memory: Transparently swap the tensors produced in forward inference but needed for back prop from GPU to CPU. This allows training RNNs which would typically not fit on a single GPU, with very minimal (or no) performance penalty.
time_major: The shape format of the `inputs` and `outputs` Tensors.If true, these `Tensors` must be shaped `[max_time, batch_size, depth]`. If false, these `Tensors` must be shaped `[batch_size, max_time, depth]`. Using `time_major = True` is a bit more efficient because it avoids transposes at the beginning and end of the RNN calculation. However, most TensorFlow data is batch-major, so by default this function accepts input and emits output in batch-major form.
scope: VariableScope for the created subgraph; defaults to "bidirectional_rnn"。命名空間

返回是一個tuple(outputs,outputs_state)，outputs也是一個元組(output_fw,output_bw)，默認情況下(即time_major=False)每一個都為一個張量，形狀為[batch_size,max_time,layers_output]，如果需要總的結果，可以將前后項的layers_output使用tf.concat連接起來。

   hiddens = tf.concat(hiddens,axis=2)

除此之外，我們一般還需要轉換為時序優先的矩陣。

 hiddens = tf.transpose(hiddens,[1,0,2])

4.動態多層雙向RNN

def tf.contrib.rnn.stack_bidirectional_dynamic_rnn(cells_fw, cells_bw, inputs, initial_states_fw=None, initial_states_bw=None, dtype=None, sequence_length=None, parallel_iterations=None, scope=None):

cells_fw: List of instances of RNNCell, one per layer, to be used for forward direction.實例化后的cell列表，代表正向。
cells_bw: List of instances of RNNCell, one per layer,to be used for backward direction.實例化后的cell列表，代表反向。
inputs: The RNN inputs. this must be a tensor of shape:`[batch_size, max_time, ...]`, or a nested tuple of such elements.輸入數據，是一個張量，默認是三維張量，[batch_size,max_time,...]，batch_size表示一個批次數量，max_time:表示時間序列總數，后面是一個時序輸入數據的長度。
initial_states_fw: (optional) A list of the initial states (one per layer) for the forward RNN. Each tensor must has an appropriate type and shape
`[batch_size, cell_fw.state_size]`.前向細胞狀態初始化，默認為0.
initial_states_bw: (optional) Same as for `initial_states_fw`, but using the corresponding properties of `cells_bw`.后向細胞狀態初始化，默認為0.
dtype: (optional) The data type for the initial state. Required if either of the initial states are not provided.數據類型。
sequence_length: (optional) An int32/int64 vector, size `[batch_size]`, containing the actual lengths for each of the sequences.序列長度。
parallel_iterations: (Default: 32). The number of iterations to run in parallel. Those operations which do not have any temporal dependency and can be run in parallel, will be. This parameter trades off time for space. Values >> 1 use more memory but take less time, while smaller values use less memory but computations take longer.
scope: VariableScope for the created subgraph; defaults to None.命名空間

返回是一個tuple(outputs,output_state_fw,output_state_bw)，outputs為一個張量，形狀為[batch_size,max_time,layers_output],layers_output包含tf.concat之后的正向和反向的輸出。

我們一般還需要轉換為時序優先的矩陣。

 hiddens = tf.transpose(hiddens,[1,0,2])

六 Tensoflow實現單層單向RNN

我們使用MNIST數據集作為數據源，通過構建RNN對MNIST數據集進行分類，由於單張圖像大小為28x28，我們把每張圖像分成28個總時序，每個時序是28個值，然后送入RNN網絡。

# -*- coding: utf-8 -*-
"""
Created on Fri May 11 11:49:52 2018

@author: zy
"""

'''
使用TensorFlow庫實現單層RNN  分別使用LSTM單元，GRU單元，static_rnn和dynamic_rnn函數
'''

import tensorflow as tf
import numpy as np
tf.reset_default_graph()

'''
一 使用動態RNN處理變長序列
'''
np.random.seed(0)

#創建輸入數據  正態分布 2：表示一次的批次數量 4：表示時間序列總數  5：表示具體的數據
X = np.random.randn(2,4,5)

#第二個樣本長度為3
X[1,1:] = 0
#每一個輸入序列的長度
seq_lengths = [4,1]
print('X:\n',X)

#分別建立一個LSTM與GRU的cell，比較輸出的狀態  3是隱藏層節點的個數
cell = tf.contrib.rnn.BasicLSTMCell(num_units = 3,state_is_tuple = True)
gru = tf.contrib.rnn.GRUCell(3)

#如果沒有initial_state，必須指定a dtype
outputs,last_states = tf.nn.dynamic_rnn(cell,X,seq_lengths,dtype =tf.float64 )
gruoutputs,grulast_states = tf.nn.dynamic_rnn(gru,X,seq_lengths,dtype =tf.float64 )

sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())

result,sta,gruout,grusta = sess.run([outputs,last_states,gruoutputs,grulast_states])

print('全序列:\n',result[0])
print('短序列:\n',result[1])

#由於在BasicLSTMCell設置了state_is_tuple是True，所以lstm的值為 (狀態ct,輸出h）
print('LSTM的狀態:',len(sta),'\n',sta[1])  

print('GRU的全序列：\n',gruout[0])
print('GRU的短序列：\n',gruout[1])
#GRU沒有狀態輸出，其狀態就是最終輸出，因為批次是兩個，所以輸出為2
print('GRU的狀態:',len(grusta),'\n',grusta[1]) 




'''
二 構建單層單向RNN網絡對MNIST數據集分類
'''
'''
MNIST數據集一個樣本長度為28 x 28 
我們可以把一個樣本分成28個時間段，每段內容是28個值，然后送入LSTM或者GRU網絡
我們設置隱藏層的節點數為128
'''


def single_layer_static_lstm(input_x,n_steps,n_hidden):
    '''
    返回靜態單層LSTM單元的輸出，以及cell狀態
    
    args:
        input_x:輸入張量 形狀為[batch_size,n_steps,n_input]
        n_steps:時序總數
        n_hidden：LSTM單元輸出的節點個數 即隱藏層節點數
    '''
    
    #把輸入input_x按列拆分，並返回一個有n_steps個張量組成的list 如batch_sizex28x28的輸入拆成[(batch_size,28),((batch_size,28))....] 
    #如果是調用的是靜態rnn函數，需要這一步處理   即相當於把序列作為第一維度 
    input_x1 = tf.unstack(input_x,num=n_steps,axis=1)

    #可以看做隱藏層
    lstm_cell = tf.contrib.rnn.BasicLSTMCell(num_units=n_hidden,forget_bias=1.0)
    #靜態rnn函數傳入的是一個張量list  每一個元素都是一個(batch_size,n_input)大小的張量 
    hiddens,states = tf.contrib.rnn.static_rnn(cell=lstm_cell, inputs=input_x1, dtype=tf.float32)

    return hiddens,states


def single_layer_static_gru(input_x,n_steps,n_hidden):
    '''
    返回靜態單層GRU單元的輸出，以及cell狀態
    
    args:
        input_x:輸入張量 形狀為[batch_size,n_steps,n_input]
        n_steps:時序總數
        n_hidden：gru單元輸出的節點個數 即隱藏層節點數
    '''
    
    #把輸入input_x按列拆分，並返回一個有n_steps個張量組成的list 如batch_sizex28x28的輸入拆成[(batch_size,28),((batch_size,28))....] 
    #如果是調用的是靜態rnn函數，需要這一步處理   即相當於把序列作為第一維度 
    input_x1 = tf.unstack(input_x,num=n_steps,axis=1)

    #可以看做隱藏層
    gru_cell = tf.contrib.rnn.GRUCell(num_units=n_hidden)
    #靜態rnn函數傳入的是一個張量list  每一個元素都是一個(batch_size,n_input)大小的張量 
    hiddens,states = tf.contrib.rnn.static_rnn(cell=gru_cell,inputs=input_x1,dtype=tf.float32)
        
    return hiddens,states


def single_layer_dynamic_lstm(input_x,n_steps,n_hidden):
    '''
    返回動態單層LSTM單元的輸出，以及cell狀態
    
    args:
        input_x:輸入張量  形狀為[batch_size,n_steps,n_input]
        n_steps:時序總數
        n_hidden：LSTM單元輸出的節點個數 即隱藏層節點數
    '''
    #可以看做隱藏層
    lstm_cell = tf.contrib.rnn.BasicLSTMCell(num_units=n_hidden,forget_bias=1.0)
    #動態rnn函數傳入的是一個三維張量，[batch_size,n_steps,n_input]  輸出也是這種形狀
    hiddens,states = tf.nn.dynamic_rnn(cell=lstm_cell,inputs=input_x,dtype=tf.float32)

    #注意這里輸出需要轉置  轉換為時序優先的
    hiddens = tf.transpose(hiddens,[1,0,2])
    return hiddens,states



def single_layer_dynamic_gru(input_x,n_steps,n_hidden):
    '''
    返回動態單層GRU單元的輸出，以及cell狀態
    
    args:
        input_x:輸入張量 形狀為[batch_size,n_steps,n_input]
        n_steps:時序總數
        n_hidden：gru單元輸出的節點個數 即隱藏層節點數
    '''
    
    #可以看做隱藏層
    gru_cell = tf.contrib.rnn.GRUCell(num_units=n_hidden)
    #動態rnn函數傳入的是一個三維張量，[batch_size,n_steps,n_input]  輸出也是這種形狀
    hiddens,states = tf.nn.dynamic_rnn(cell=gru_cell,inputs=input_x,dtype=tf.float32)
        
    
    #注意這里輸出需要轉置  轉換為時序優先的
    hiddens = tf.transpose(hiddens,[1,0,2])    
    return hiddens,states


def  mnist_rnn_classfication(flag):
    '''
    對MNIST進行分類
    
    arg:
        flags:表示構建的RNN結構是哪種
            1：單層靜態LSTM
            2: 單層靜態GRU
            3：單層動態LSTM
            4: 單層動態GRU
    '''
        
    '''
    1. 導入數據集
    '''
    tf.reset_default_graph()
    from tensorflow.examples.tutorials.mnist import input_data
    
    #mnist是一個輕量級的類，它以numpy數組的形式存儲着訓練，校驗，測試數據集  one_hot表示輸出二值化后的10維
    mnist = input_data.read_data_sets('MNIST-data',one_hot=True)
    
    print(type(mnist)) #<class 'tensorflow.contrib.learn.python.learn.datasets.base.Datasets'>
    
    print('Training data shape:',mnist.train.images.shape)           #Training data shape: (55000, 784)
    print('Test data shape:',mnist.test.images.shape)                #Test data shape: (10000, 784)
    print('Validation data shape:',mnist.validation.images.shape)    #Validation data shape: (5000, 784)
    print('Training label shape:',mnist.train.labels.shape)          #Training label shape: (55000, 10)
    
    '''
    2 定義參數，以及網絡結構
    '''
    n_input = 28             #LSTM單元輸入節點的個數
    n_steps = 28             #序列長度
    n_hidden = 128           #LSTM單元輸出節點個數(即隱藏層個數)
    n_classes = 10           #類別
    batch_size = 128         #小批量大小
    training_step = 5000     #迭代次數
    display_step  = 200      #顯示步數
    learning_rate = 1e-4     #學習率  
    
    
    #定義占位符
    #batch_size：表示一次的批次樣本數量batch_size  n_steps：表示時間序列總數  n_input：表示一個時序具體的數據長度  即一共28個時序，一個時序送入28個數據進入LSTM網絡
    input_x = tf.placeholder(dtype=tf.float32,shape=[None,n_steps,n_input])
    input_y = tf.placeholder(dtype=tf.float32,shape=[None,n_classes])


    #可以看做隱藏層
    if  flag == 1:
        print('單層靜態LSTM網絡：')
        hiddens,states = single_layer_static_lstm(input_x,n_steps,n_hidden)
    elif flag == 2:
        print('單層靜態gru網絡：')
        hiddens,states = single_layer_static_gru(input_x,n_steps,n_hidden)
    elif  flag == 3:
        print('單層動態LSTM網絡：')
        hiddens,states = single_layer_dynamic_lstm(input_x,n_steps,n_hidden)
    elif flag == 4:
        print('單層動態gru網絡：')
        hiddens,states = single_layer_dynamic_gru(input_x,n_steps,n_hidden)
                
    print('hidden:',hiddens[-1].shape)      #(128,128)
    
    #取LSTM最后一個時序的輸出，然后經過全連接網絡得到輸出值
    output = tf.contrib.layers.fully_connected(inputs=hiddens[-1],num_outputs=n_classes,activation_fn = tf.nn.softmax)
    
    '''
    3 設置對數似然損失函數
    '''
    #代價函數 J =-(Σy.logaL)/n    .表示逐元素乘
    cost = tf.reduce_mean(-tf.reduce_sum(input_y*tf.log(output),axis=1))
    
    '''
    4 求解
    '''
    train = tf.train.AdamOptimizer(learning_rate).minimize(cost)
    
    #預測結果評估
    #tf.argmax(output,1)  按行統計最大值得索引
    correct = tf.equal(tf.argmax(output,1),tf.argmax(input_y,1))       #返回一個數組 表示統計預測正確或者錯誤 
    accuracy = tf.reduce_mean(tf.cast(correct,tf.float32))             #求准確率
    
    
    #創建list 保存每一迭代的結果
    test_accuracy_list = []
    test_cost_list=[]
    
    
    with tf.Session() as sess:
        #使用會話執行圖
        sess.run(tf.global_variables_initializer())   #初始化變量    
        
        #開始迭代 使用Adam優化的隨機梯度下降法
        for i in range(training_step): 
            x_batch,y_batch = mnist.train.next_batch(batch_size = batch_size)   
            #Reshape data to get 28 seq of 28 elements
            x_batch = x_batch.reshape([-1,n_steps,n_input])
            
            #開始訓練
            train.run(feed_dict={input_x:x_batch,input_y:y_batch})   
            if (i+1) % display_step == 0:
                 #輸出訓練集准確率        
                training_accuracy,training_cost = sess.run([accuracy,cost],feed_dict={input_x:x_batch,input_y:y_batch})   
                print('Step {0}:Training set accuracy {1},cost {2}.'.format(i+1,training_accuracy,training_cost))
        
        
        #全部訓練完成做測試  分成200次，一次測試50個樣本
        #輸出測試機准確率   如果一次性全部做測試，內容不夠用會出現OOM錯誤。所以測試時選取比較小的mini_batch來測試
        for i in range(200):        
            x_batch,y_batch = mnist.test.next_batch(batch_size = 50)      
            #Reshape data to get 28 seq of 28 elements
            x_batch = x_batch.reshape([-1,n_steps,n_input])
            test_accuracy,test_cost = sess.run([accuracy,cost],feed_dict={input_x:x_batch,input_y:y_batch})
            test_accuracy_list.append(test_accuracy)
            test_cost_list.append(test_cost) 
            if (i+1)% 20 == 0:
                 print('Step {0}:Test set accuracy {1},cost {2}.'.format(i+1,test_accuracy,test_cost)) 
        print('Test accuracy:',np.mean(test_accuracy_list))


if __name__ == '__main__':
    mnist_rnn_classfication(1)    #1：單層靜態LSTM
    mnist_rnn_classfication(2)    #2：單層靜態gru
    mnist_rnn_classfication(3)    #3：單層動態LSTM
    mnist_rnn_classfication(4)    #4：單層動態gru

以上是部分截圖....

七 Tensoflow實現多層單向RNN

# -*- coding: utf-8 -*-
"""
Created on Fri May 11 16:29:11 2018

@author: zy
"""

'''
使用TensorFlow庫實現單層RNN  分別使用LSTM單元，GRU單元，static_rnn和dynamic_rnn函數
'''

import tensorflow as tf
import numpy as np


'''
構建多層單向RNN網絡對MNIST數據集分類
'''
'''
MNIST數據集一個樣本長度為28 x 28 
我們可以把一個樣本分成28個時間段，每段內容是28個值，然后送入LSTM或者GRU網絡
我們設置隱藏層的節點數為128
'''


def multi_layer_static_lstm(input_x,n_steps,n_hidden):
    '''
    返回靜態多層LSTM單元的輸出，以及cell狀態
    
    args:
        input_x:輸入張量 形狀為[batch_size,n_steps,n_input]
        n_steps:時序總數
        n_hidden：LSTM單元輸出的節點個數 即隱藏層節點數
    '''
    
    #把輸入input_x按列拆分，並返回一個有n_steps個張量組成的list 如batch_sizex28x28的輸入拆成[(batch_size,28),((batch_size,28))....] 
    #如果是調用的是靜態rnn函數，需要這一步處理   即相當於把序列作為第一維度 
    input_x1 = tf.unstack(input_x,num=n_steps,axis=1)

    #可以看做3個隱藏層
    stacked_rnn = []
    for i in range(3):
        stacked_rnn.append(tf.contrib.rnn.LSTMCell(num_units=n_hidden))
        
    #多層RNN的實現 例如cells=[cell1,cell2]，則表示一共有兩層，數據經過cell1后還要經過cells
    mcell = tf.contrib.rnn.MultiRNNCell(cells=stacked_rnn)
    
    #靜態rnn函數傳入的是一個張量list  每一個元素都是一個(batch_size,n_input)大小的張量 
    hiddens,states = tf.contrib.rnn.static_rnn(cell=mcell,inputs=input_x1,dtype=tf.float32)

    return hiddens,states


def multi_layer_static_gru(input_x,n_steps,n_hidden):
    '''
    返回靜態多層GRU單元的輸出，以及cell狀態
    
    args:
        input_x:輸入張量 形狀為[batch_size,n_steps,n_input]
        n_steps:時序總數
        n_hidden：gru單元輸出的節點個數 即隱藏層節點數
    '''
    
    #把輸入input_x按列拆分，並返回一個有n_steps個張量組成的list 如batch_sizex28x28的輸入拆成[(batch_size,28),((batch_size,28))....] 
    #如果是調用的是靜態rnn函數，需要這一步處理   即相當於把序列作為第一維度 
    input_x1 = tf.unstack(input_x,num=n_steps,axis=1)

        #可以看做3個隱藏層
    stacked_rnn = []
    for i in range(3):
        stacked_rnn.append(tf.contrib.rnn.GRUCell(num_units=n_hidden))    
        
    #多層RNN的實現 例如cells=[cell1,cell2]，則表示一共有兩層，數據經過cell1后還要經過cells
    mcell = tf.contrib.rnn.MultiRNNCell(cells=stacked_rnn)
    
    #靜態rnn函數傳入的是一個張量list  每一個元素都是一個(batch_size,n_input)大小的張量 
    hiddens,states = tf.contrib.rnn.static_rnn(cell=mcell,inputs=input_x1,dtype=tf.float32)
        
    return hiddens,states


def multi_layer_static_mix(input_x,n_steps,n_hidden):
    '''
    返回靜態多層GRU和LSTM混合單元的輸出，以及cell狀態
    
    args:
        input_x:輸入張量 形狀為[batch_size,n_steps,n_input]
        n_steps:時序總數
        n_hidden：gru單元輸出的節點個數 即隱藏層節點數
    '''
    
    #把輸入input_x按列拆分，並返回一個有n_steps個張量組成的list 如batch_sizex28x28的輸入拆成[(batch_size,28),((batch_size,28))....] 
    #如果是調用的是靜態rnn函數，需要這一步處理   即相當於把序列作為第一維度 
    input_x1 = tf.unstack(input_x,num=n_steps,axis=1)
    
    #可以看做2個隱藏層
    gru_cell = tf.contrib.rnn.GRUCell(num_units=n_hidden*2)
    lstm_cell = tf.contrib.rnn.LSTMCell(num_units=n_hidden)
    
    #多層RNN的實現 例如cells=[cell1,cell2]，則表示一共有兩層，數據經過cell1后還要經過cells
    mcell = tf.contrib.rnn.MultiRNNCell(cells=[lstm_cell,gru_cell])
    
    #靜態rnn函數傳入的是一個張量list  每一個元素都是一個(batch_size,n_input)大小的張量 
    hiddens,states = tf.contrib.rnn.static_rnn(cell=mcell,inputs=input_x1,dtype=tf.float32)
    
    return hiddens,states


def multi_layer_dynamic_lstm(input_x,n_steps,n_hidden):
    '''
    返回動態多層LSTM單元的輸出，以及cell狀態
    
    args:
        input_x:輸入張量  形狀為[batch_size,n_steps,n_input]
        n_steps:時序總數
        n_hidden：LSTM單元輸出的節點個數 即隱藏層節點數
    '''
    #可以看做3個隱藏層
    stacked_rnn = []
    for i in range(3):
        stacked_rnn.append(tf.contrib.rnn.LSTMCell(num_units=n_hidden))
        
    #多層RNN的實現 例如cells=[cell1,cell2]，則表示一共有兩層，數據經過cell1后還要經過cells
    mcell = tf.contrib.rnn.MultiRNNCell(cells=stacked_rnn)
    
    #動態rnn函數傳入的是一個三維張量，[batch_size,n_steps,n_input]  輸出也是這種形狀
    hiddens,states = tf.nn.dynamic_rnn(cell=mcell,inputs=input_x,dtype=tf.float32)
    
    #注意這里輸出需要轉置  轉換為時序優先的
    hiddens = tf.transpose(hiddens,[1,0,2])    
    return hiddens,states


def multi_layer_dynamic_gru(input_x,n_steps,n_hidden):
    '''
    返回動態多層GRU單元的輸出，以及cell狀態
    
    args:
        input_x:輸入張量 形狀為[batch_size,n_steps,n_input]
        n_steps:時序總數
        n_hidden：gru單元輸出的節點個數 即隱藏層節點數
    '''
    #可以看做3個隱藏層
    stacked_rnn = []
    for i in range(3):
        stacked_rnn.append(tf.contrib.rnn.GRUCell(num_units=n_hidden))
        
    #多層RNN的實現 例如cells=[cell1,cell2]，則表示一共有兩層，數據經過cell1后還要經過cells
    mcell = tf.contrib.rnn.MultiRNNCell(cells=stacked_rnn)
    
    #動態rnn函數傳入的是一個三維張量，[batch_size,n_steps,n_input]  輸出也是這種形狀
    hiddens,states = tf.nn.dynamic_rnn(cell=mcell,inputs=input_x,dtype=tf.float32)
    
    #注意這里輸出需要轉置  轉換為時序優先的
    hiddens = tf.transpose(hiddens,[1,0,2])    
    return hiddens,states   



def multi_layer_dynamic_mix(input_x,n_steps,n_hidden):
    '''
    返回動態多層GRU和LSTM混合單元的輸出，以及cell狀態
    
    args:
        input_x:輸入張量 形狀為[batch_size,n_steps,n_input]
        n_steps:時序總數
        n_hidden：gru單元輸出的節點個數 即隱藏層節點數
    '''
        
    #可以看做2個隱藏層
    gru_cell = tf.contrib.rnn.GRUCell(num_units=n_hidden*2)
    lstm_cell = tf.contrib.rnn.LSTMCell(num_units=n_hidden)
    
    #多層RNN的實現 例如cells=[cell1,cell2]，則表示一共有兩層，數據經過cell1后還要經過cells
    mcell = tf.contrib.rnn.MultiRNNCell(cells=[lstm_cell,gru_cell])
    
    #動態rnn函數傳入的是一個三維張量，[batch_size,n_steps,n_input]  輸出也是這種形狀
    hiddens,states = tf.nn.dynamic_rnn(cell=mcell,inputs=input_x,dtype=tf.float32)
    
    #注意這里輸出需要轉置  轉換為時序優先的
    hiddens = tf.transpose(hiddens,[1,0,2])    
    return hiddens,states



def  mnist_rnn_classfication(flag):
    '''
    對MNIST進行分類
    
    arg:
        flags:表示構建的RNN結構是哪種
            1：多層靜態LSTM
            2: 多層靜態GRU
            3：多層靜態LSTM和GRU混合
            4：多層動態LSTM
            5: 多層動態GRU
            6: 多層動態LSTM和GRU混合
    '''
        
    '''
    1. 導入數據集
    '''
    tf.reset_default_graph()
    from tensorflow.examples.tutorials.mnist import input_data
    
    #mnist是一個輕量級的類，它以numpy數組的形式存儲着訓練，校驗，測試數據集  one_hot表示輸出二值化后的10維
    mnist = input_data.read_data_sets('MNIST-data',one_hot=True)
    
    print(type(mnist)) #<class 'tensorflow.contrib.learn.python.learn.datasets.base.Datasets'>
    
    print('Training data shape:',mnist.train.images.shape)           #Training data shape: (55000, 784)
    print('Test data shape:',mnist.test.images.shape)                #Test data shape: (10000, 784)
    print('Validation data shape:',mnist.validation.images.shape)    #Validation data shape: (5000, 784)
    print('Training label shape:',mnist.train.labels.shape)          #Training label shape: (55000, 10)
    
    '''
    2 定義參數，以及網絡結構
    '''
    n_input = 28             #LSTM單元輸入節點的個數
    n_steps = 28             #序列長度
    n_hidden = 128           #LSTM單元輸出節點個數(即隱藏層個數)
    n_classes = 10           #類別
    batch_size = 128         #小批量大小
    training_step = 1000     #迭代次數
    display_step  = 200      #顯示步數
    learning_rate = 1e-4     #學習率  
    
    
    #定義占位符
    #batch_size：表示一次的批次樣本數量batch_size  n_steps：表示時間序列總數  n_input：表示一個時序具體的數據長度  即一共28個時序，一個時序送入28個數據進入LSTM網絡
    input_x = tf.placeholder(dtype=tf.float32,shape=[None,n_steps,n_input])
    input_y = tf.placeholder(dtype=tf.float32,shape=[None,n_classes])


    #可以看做隱藏層
    if  flag == 1:
        print('多層靜態LSTM網絡：')
        hiddens,states = multi_layer_static_lstm(input_x,n_steps,n_hidden)
    elif flag == 2:
        print('多層靜態gru網絡：')
        hiddens,states = multi_layer_static_gru(input_x,n_steps,n_hidden)
    elif flag == 3:
        print('多層靜態LSTM和gru混合網絡：')
        hiddens,states = multi_layer_static_mix(input_x,n_steps,n_hidden)
    elif  flag == 4:
        print('多層動態LSTM網絡：')
        hiddens,states = multi_layer_dynamic_lstm(input_x,n_steps,n_hidden)
    elif flag == 5:
        print('多層動態gru網絡：')
        hiddens,states = multi_layer_dynamic_gru(input_x,n_steps,n_hidden)
    elif flag == 6:
        print('多層動態LSTM和gru混合網絡：')
        hiddens,states = multi_layer_dynamic_mix(input_x,n_steps,n_hidden)
                
    print('hidden:',hiddens[-1].shape)      #(128,128)
    
    #取LSTM最后一個時序的輸出，然后經過全連接網絡得到輸出值
    output = tf.contrib.layers.fully_connected(inputs=hiddens[-1],num_outputs=n_classes,activation_fn = tf.nn.softmax)
    
    '''
    3 設置對數似然損失函數
    '''
    #代價函數 J =-(Σy.logaL)/n    .表示逐元素乘
    cost = tf.reduce_mean(-tf.reduce_sum(input_y*tf.log(output),axis=1))
    
    '''
    4 求解
    '''
    train = tf.train.AdamOptimizer(learning_rate).minimize(cost)
    
    #預測結果評估
    #tf.argmax(output,1)  按行統計最大值得索引
    correct = tf.equal(tf.argmax(output,1),tf.argmax(input_y,1))       #返回一個數組 表示統計預測正確或者錯誤 
    accuracy = tf.reduce_mean(tf.cast(correct,tf.float32))             #求准確率
    
    
    #創建list 保存每一迭代的結果
    test_accuracy_list = []
    test_cost_list=[]
    
    
    with tf.Session() as sess:
        #使用會話執行圖
        sess.run(tf.global_variables_initializer())   #初始化變量    
        
        #開始迭代 使用Adam優化的隨機梯度下降法
        for i in range(training_step): 
            x_batch,y_batch = mnist.train.next_batch(batch_size = batch_size)   
            #Reshape data to get 28 seq of 28 elements
            x_batch = x_batch.reshape([-1,n_steps,n_input])
            
            #開始訓練
            train.run(feed_dict={input_x:x_batch,input_y:y_batch})   
            if (i+1) % display_step == 0:
                 #輸出訓練集准確率        
                training_accuracy,training_cost = sess.run([accuracy,cost],feed_dict={input_x:x_batch,input_y:y_batch})   
                print('Step {0}:Training set accuracy {1},cost {2}.'.format(i+1,training_accuracy,training_cost))
        
        
        #全部訓練完成做測試  分成200次，一次測試50個樣本
        #輸出測試機准確率   如果一次性全部做測試，內容不夠用會出現OOM錯誤。所以測試時選取比較小的mini_batch來測試
        for i in range(200):        
            x_batch,y_batch = mnist.test.next_batch(batch_size = 50)      
            #Reshape data to get 28 seq of 28 elements
            x_batch = x_batch.reshape([-1,n_steps,n_input])
            test_accuracy,test_cost = sess.run([accuracy,cost],feed_dict={input_x:x_batch,input_y:y_batch})
            test_accuracy_list.append(test_accuracy)
            test_cost_list.append(test_cost) 
            if (i+1)% 20 == 0:
                 print('Step {0}:Test set accuracy {1},cost {2}.'.format(i+1,test_accuracy,test_cost)) 
        print('Test accuracy:',np.mean(test_accuracy_list))


if __name__ == '__main__':
    mnist_rnn_classfication(1)    #1：多層靜態LSTM
    mnist_rnn_classfication(2)    #2：多層靜態gru
    mnist_rnn_classfication(3)    #3: 多層靜態LSTM和gru混合網絡：
    mnist_rnn_classfication(4)    #4：多層動態LSTM
    mnist_rnn_classfication(5)    #5：多層動態gru
    mnist_rnn_classfication(6)    #3: 多層動態LSTM和gru混合網絡：

以上是部分截圖...

八 Tensoflow實現雙向RNN

# -*- coding: utf-8 -*-
"""
Created on Fri May 11 21:24:41 2018

@author: zy
"""


'''
使用TensorFlow庫實現單層雙向RNN  分別使用LSTM單元，GRU單元，static_rnn和dynamic_rnn函數
'''

import tensorflow as tf
import numpy as np


'''
構建雙向RNN網絡對MNIST數據集分類
'''
'''
MNIST數據集一個樣本長度為28 x 28 
我們可以把一個樣本分成28個時間段，每段內容是28個值，然后送入LSTM或者GRU網絡
我們設置隱藏層的節點數為128
'''


def single_layer_static_bi_lstm(input_x,n_steps,n_hidden):
    '''
    返回單層靜態雙向LSTM單元的輸出，以及cell狀態
    
    args:
        input_x:輸入張量 形狀為[batch_size,n_steps,n_input]
        n_steps:時序總數
        n_hidden：LSTM單元輸出的節點個數 即隱藏層節點數
    '''
    
    #把輸入input_x按列拆分，並返回一個有n_steps個張量組成的list 如batch_sizex28x28的輸入拆成[(batch_size,28),((batch_size,28))....] 
    #如果是調用的是靜態rnn函數，需要這一步處理   即相當於把序列作為第一維度 
    input_x1 = tf.unstack(input_x,num=n_steps,axis=1)



    #正向
    lstm_fw_cell = tf.contrib.rnn.BasicLSTMCell(num_units=n_hidden,forget_bias = 1.0)
    #反向
    lstm_bw_cell = tf.contrib.rnn.BasicLSTMCell(num_units=n_hidden,forget_bias = 1.0)


    #靜態rnn函數傳入的是一個張量list  每一個元素都是一個(batch_size,n_input)大小的張量  這里的輸出hiddens是一個list 每一個元素都是前向輸出,后向輸出的合並
    hiddens,fw_state,bw_state = tf.contrib.rnn.static_bidirectional_rnn(cell_fw=lstm_fw_cell,cell_bw=lstm_bw_cell,inputs=input_x1,dtype=tf.float32)
        
    print('hiddens:\n',type(hiddens),len(hiddens),hiddens[0].shape,hiddens[1].shape)    #<class 'list'> 28 (?, 256) (?, 256)
    
    return hiddens,fw_state,bw_state


def single_layer_dynamic_bi_lstm(input_x,n_steps,n_hidden):
    '''
    返回單層動態雙向LSTM單元的輸出，以及cell狀態
    
    args:
        input_x:輸入張量 形狀為[batch_size,n_steps,n_input]
        n_steps:時序總數
        n_hidden：gru單元輸出的節點個數 即隱藏層節點數
    '''
    
     #正向
    lstm_fw_cell = tf.contrib.rnn.BasicLSTMCell(num_units=n_hidden,forget_bias = 1.0)
    #反向
    lstm_bw_cell = tf.contrib.rnn.BasicLSTMCell(num_units=n_hidden,forget_bias = 1.0)

    
    #動態rnn函數傳入的是一個三維張量，[batch_size,n_steps,n_input]  輸出是一個元組 每一個元素也是這種形狀
    hiddens,state = tf.nn.bidirectional_dynamic_rnn(cell_fw=lstm_fw_cell,cell_bw=lstm_bw_cell,inputs=input_x,dtype=tf.float32)
    
    print('hiddens:\n',type(hiddens),len(hiddens),hiddens[0].shape,hiddens[1].shape)   #<class 'tuple'> 2 (?, 28, 128) (?, 28, 128)
    #按axis=2合並 (?,28,128) (?,28,128)按最后一維合並(?,28,256)
    hiddens = tf.concat(hiddens,axis=2)
    
    #注意這里輸出需要轉置  轉換為時序優先的
    hiddens = tf.transpose(hiddens,[1,0,2])    
        
    return hiddens,state


def multi_layer_static_bi_lstm(input_x,n_steps,n_hidden):
    '''
    返回多層靜態雙向LSTM單元的輸出，以及cell狀態
    
    args:
        input_x:輸入張量 形狀為[batch_size,n_steps,n_input]
        n_steps:時序總數
        n_hidden：LSTM單元輸出的節點個數 即隱藏層節點數
    '''
    
    #把輸入input_x按列拆分，並返回一個有n_steps個張量組成的list 如batch_sizex28x28的輸入拆成[(batch_size,28),((batch_size,28))....] 
    #如果是調用的是靜態rnn函數，需要這一步處理   即相當於把序列作為第一維度 
    input_x1 = tf.unstack(input_x,num=n_steps,axis=1)

    stacked_fw_rnn = []
    stacked_bw_rnn = []
    for i in range(3):
        #正向
        stacked_fw_rnn.append(tf.contrib.rnn.BasicLSTMCell(num_units=n_hidden,forget_bias = 1.0))
        #反向
        stacked_bw_rnn.append(tf.contrib.rnn.BasicLSTMCell(num_units=n_hidden,forget_bias = 1.0))


    #靜態rnn函數傳入的是一個張量list  每一個元素都是一個(batch_size,n_input)大小的張量 這里的輸出hiddens是一個list 每一個元素都是前向輸出,后向輸出的合並
    hiddens,fw_state,bw_state = tf.contrib.rnn.stack_bidirectional_rnn(stacked_fw_rnn,stacked_bw_rnn,inputs=input_x1,dtype=tf.float32)
        
    print('hiddens:\n',type(hiddens),len(hiddens),hiddens[0].shape,hiddens[1].shape)    #<class 'list'> 28 (?, 256) (?, 256)

    return hiddens,fw_state,bw_state


def multi_layer_dynamic_bi_lstm(input_x,n_steps,n_hidden):
    '''
    返回多層動態雙向LSTM單元的輸出，以及cell狀態
    
    args:
        input_x:輸入張量 形狀為[batch_size,n_steps,n_input]
        n_steps:時序總數
        n_hidden：gru單元輸出的節點個數 即隱藏層節點數
    '''    
    stacked_fw_rnn = []
    stacked_bw_rnn = []
    for i in range(3):
        #正向
        stacked_fw_rnn.append(tf.contrib.rnn.BasicLSTMCell(num_units=n_hidden,forget_bias = 1.0))
        #反向
        stacked_bw_rnn.append(tf.contrib.rnn.BasicLSTMCell(num_units=n_hidden,forget_bias = 1.0))
    tf.contrib.rnn.MultiRNNCell
    
    #動態rnn函數傳入的是一個三維張量，[batch_size,n_steps,n_input]  輸出也是這種形狀，n_input變成了正向和反向合並之后的 即n_input*2
    hiddens,fw_state,bw_state = tf.contrib.rnn.stack_bidirectional_dynamic_rnn(stacked_fw_rnn,stacked_bw_rnn,inputs=input_x,dtype=tf.float32)
    
    print('hiddens:\n',type(hiddens),hiddens.shape)   # <class 'tensorflow.python.framework.ops.Tensor'> (?, 28, 256)
        
    #注意這里輸出需要轉置  轉換為時序優先的
    hiddens = tf.transpose(hiddens,[1,0,2])    
    
    return hiddens,fw_state,bw_state





def  mnist_rnn_classfication(flag):
    '''
    對MNIST進行分類
    
    arg:
        flags:表示構建的RNN結構是哪種
            1：單層靜態雙向LSTM
            2: 單層動態雙向LSTM
            3：多層靜態雙向LSTM
            4: 多層動態雙向LSTM

    '''
    '''
    1. 導入數據集
    '''
    tf.reset_default_graph()
    from tensorflow.examples.tutorials.mnist import input_data
    
    #mnist是一個輕量級的類，它以numpy數組的形式存儲着訓練，校驗，測試數據集  one_hot表示輸出二值化后的10維
    mnist = input_data.read_data_sets('MNIST-data',one_hot=True)
    
    print(type(mnist)) #<class 'tensorflow.contrib.learn.python.learn.datasets.base.Datasets'>
    
    print('Training data shape:',mnist.train.images.shape)           #Training data shape: (55000, 784)
    print('Test data shape:',mnist.test.images.shape)                #Test data shape: (10000, 784)
    print('Validation data shape:',mnist.validation.images.shape)    #Validation data shape: (5000, 784)
    print('Training label shape:',mnist.train.labels.shape)          #Training label shape: (55000, 10)
    
    '''
    2 定義參數，以及網絡結構
    '''
    n_input = 28             #LSTM單元輸入節點的個數
    n_steps = 28             #序列長度
    n_hidden = 128           #LSTM單元輸出節點個數(即隱藏層個數)
    n_classes = 10           #類別
    batch_size = 128         #小批量大小
    training_step = 1000     #迭代次數
    display_step  = 200      #顯示步數
    learning_rate = 1e-4     #學習率  
    
    
    #定義占位符
    #batch_size：表示一次的批次樣本數量batch_size  n_steps：表示時間序列總數  n_input：表示一個時序具體的數據長度  即一共28個時序，一個時序送入28個數據進入LSTM網絡
    input_x = tf.placeholder(dtype=tf.float32,shape=[None,n_steps,n_input])
    input_y = tf.placeholder(dtype=tf.float32,shape=[None,n_classes])
    
    
    #可以看做隱藏層
    if  flag == 1:
        print('單層靜態雙向LSTM網絡：')
        hiddens,fw_state,bw_state = single_layer_static_bi_lstm(input_x,n_steps,n_hidden)
    elif flag == 2:
        print('單層動態雙向LSTM網絡：')
        hiddens,bw_state = single_layer_dynamic_bi_lstm(input_x,n_steps,n_hidden)
    elif flag == 3:
        print('多層靜態雙向LSTM網絡：')
        hiddens,fw_state,bw_state = multi_layer_static_bi_lstm(input_x,n_steps,n_hidden)
    elif  flag == 4:
        print('多層動態雙向LSTM網絡：')
        hiddens,fw_state,bw_state = multi_layer_dynamic_bi_lstm(input_x,n_steps,n_hidden)


    
    #取LSTM最后一個時序的輸出，然后經過全連接網絡得到輸出值
    output = tf.contrib.layers.fully_connected(inputs=hiddens[-1],num_outputs=n_classes,activation_fn = tf.nn.softmax)
    
    '''
    3 設置對數似然損失函數
    '''
    #代價函數 J =-(Σy.logaL)/n    .表示逐元素乘
    cost = tf.reduce_mean(-tf.reduce_sum(input_y*tf.log(output),axis=1))
    
    '''
    4 求解
    '''
    train = tf.train.AdamOptimizer(learning_rate).minimize(cost)
    
    #預測結果評估
    #tf.argmax(output,1)  按行統計最大值得索引
    correct = tf.equal(tf.argmax(output,1),tf.argmax(input_y,1))       #返回一個數組 表示統計預測正確或者錯誤 
    accuracy = tf.reduce_mean(tf.cast(correct,tf.float32))             #求准確率
    
    
    #創建list 保存每一迭代的結果
    test_accuracy_list = []
    test_cost_list=[]
    
    
    with tf.Session() as sess:
        #使用會話執行圖
        sess.run(tf.global_variables_initializer())   #初始化變量    
        
        #開始迭代 使用Adam優化的隨機梯度下降法
        for i in range(training_step): 
            x_batch,y_batch = mnist.train.next_batch(batch_size = batch_size)   
            #Reshape data to get 28 seq of 28 elements
            x_batch = x_batch.reshape([-1,n_steps,n_input])
            
            #開始訓練
            train.run(feed_dict={input_x:x_batch,input_y:y_batch})   
            if (i+1) % display_step == 0:
                 #輸出訓練集准確率        
                training_accuracy,training_cost = sess.run([accuracy,cost],feed_dict={input_x:x_batch,input_y:y_batch})   
                print('Step {0}:Training set accuracy {1},cost {2}.'.format(i+1,training_accuracy,training_cost))
        
        
        #全部訓練完成做測試  分成200次，一次測試50個樣本
        #輸出測試機准確率   如果一次性全部做測試，內容不夠用會出現OOM錯誤。所以測試時選取比較小的mini_batch來測試
        for i in range(200):        
            x_batch,y_batch = mnist.test.next_batch(batch_size = 50)      
            #Reshape data to get 28 seq of 28 elements
            x_batch = x_batch.reshape([-1,n_steps,n_input])
            test_accuracy,test_cost = sess.run([accuracy,cost],feed_dict={input_x:x_batch,input_y:y_batch})
            test_accuracy_list.append(test_accuracy)
            test_cost_list.append(test_cost) 
            if (i+1)% 20 == 0:
                 print('Step {0}:Test set accuracy {1},cost {2}.'.format(i+1,test_accuracy,test_cost)) 
        print('Test accuracy:',np.mean(test_accuracy_list))
        

if __name__ == '__main__':
    mnist_rnn_classfication(1)    #1：單層靜態雙向LSTM網絡：
    mnist_rnn_classfication(2)    #2：單層動態雙向LSTM網絡：
    mnist_rnn_classfication(3)    #3: 多層靜態雙向LSTM網絡：
    mnist_rnn_classfication(4)    #4：多層動態雙向LSTM網絡：