詳細解讀簡單的lstm的實例

本文轉載自查看原文 2017-04-24 13:05 1990 python/ LSTM/ 數據挖掘及機器學習

http://blog.csdn.net/zjm750617105/article/details/51321889

本文是初學keras這兩天來，自己仿照addition_rnn.py，寫的一個實例，數據處理稍微有些不同，但是准確性相比addition_rnn.py 差一點，下面直接貼代碼，
解釋和注釋都在代碼里邊。

[python] view plain copy

<span style="font-family: Arial, Helvetica, sans-serif;">#coding:utf-8</span>

[python] view plain copy

from keras.models import Sequential
from keras.layers.recurrent import LSTM
from utils import log
from numpy import random
import numpy as np
from keras.layers.core import RepeatVector, TimeDistributedDense, Activation
'''''
先用lstm實現一個計算加法的keras版本, 根據addition_rnn.py改寫
size: 500
10次: test_acu = 0.3050 base_acu= 0.3600
30次: rest_acu = 0.3300 base_acu= 0.4250
size: 50000
10次: test_acu: loss: 0.4749 - acc: 0.8502 - val_loss: 0.4601 - val_acc: 0.8539
base_acu: loss: 0.3707 - acc: 0.9008 - val_loss: 0.3327 - val_acc: 0.9135
20次: test_acu: loss: 0.1536 - acc: 0.9505 - val_loss: 0.1314 - val_acc: 0.9584
base_acu: loss: 0.0538 - acc: 0.9891 - val_loss: 0.0454 - val_acc: 0.9919
30次: test_acu: loss: 0.0671 - acc: 0.9809 - val_loss: 0.0728 - val_acc: 0.9766
base_acu: loss: 0.0139 - acc: 0.9980 - val_loss: 0.0502 - val_acc: 0.9839
'''
log = log()
#defination the global variable
training_size = 50000
hidden_size = 128
batch_size = 128
layers = 1
maxlen = 7
single_digit = 3
def generate_data():
log.info("generate the questions and answers")
questions = []
expected = []
seen = set()
while len(seen) < training_size:
num1 = random.randint(1, 999) #generate a num [1,999]
num2 = random.randint(1, 999)
#用set來存儲又有排序,來保證只有不同數據和結果
key = tuple(sorted((num1,num2)))
if key in seen:
continue
seen.add(key)
q = '{}+{}'.format(num1,num2)
query = q + ' ' * (maxlen - len(q))
ans = str(num1 + num2)
ans = ans + ' ' * (single_digit + 1 - len(ans))
questions.append(query)
expected.append(ans)
return questions, expected
class CharacterTable():
'''''
encode: 將一個str轉化為一個n維數組
decode: 將一個n為數組轉化為一個str
輸入輸出分別為
character_table = [' ', '+', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
如果一個question = [' 123+23']
那個改question對應的數組就是(7,12):
同樣expected最大是一個四位數[' 146']:
那么ans對應的數組就是[4,12]
'''
def __init__(self, chars, maxlen):
self.chars = sorted(set(chars))
'''''
>>> b = [(c, i) for i, c in enumerate(a)]
>>> dict(b)
{' ': 0, '+': 1, '1': 3, '0': 2, '3': 5, '2': 4, '5': 7, '4': 6, '7': 9, '6': 8, '9': 11, '8': 10}
得出的結果是無序的,但是下面這種方式得出的結果是有序的
'''
self.char_index = dict((c, i) for i, c in enumerate(self.chars))
self.index_char = dict((i, c) for i, c in enumerate(self.chars))
self.maxlen = maxlen
def encode(self, C, maxlen):
X = np.zeros((maxlen, len(self.chars)))
for i, c in enumerate(C):
X[i, self.char_index[c]] = 1
return X
def decode(self, X, calc_argmax=True):
if calc_argmax:
X = X.argmax(axis=-1)
return ''.join(self.index_char[x] for x in X)
chars = '0123456789 +'
character_table = CharacterTable(chars,len(chars))
questions , expected = generate_data()
log.info('Vectorization...') #失量化
inputs = np.zeros((len(questions), maxlen, len(chars))) #(5000, 7, 12)
labels = np.zeros((len(expected), single_digit+1, len(chars))) #(5000, 4, 12)
log.info("encoding the questions and get inputs")
for i, sentence in enumerate(questions):
inputs[i] = character_table.encode(sentence, maxlen=len(sentence))
#print("questions is ", questions[0])
#print("X is ", inputs[0])
log.info("encoding the expected and get labels")
for i, sentence in enumerate(expected):
labels[i] = character_table.encode(sentence, maxlen=len(sentence))
#print("expected is ", expected[0])
#print("y is ", labels[0])
log.info("total inputs is %s"%str(inputs.shape))
log.info("total labels is %s"%str(labels.shape))
log.info("build model")
model = Sequential()
'''''
LSTM(output_dim, init='glorot_uniform', inner_init='orthogonal',
forget_bias_init='one', activation='tanh',
inner_activation='hard_sigmoid',
W_regularizer=None, U_regularizer=None, b_regularizer=None,
dropout_W=0., dropout_U=0., **kwargs)
output_dim: 輸出層的維數,或者可以用output_shape
init:
uniform(scale=0.05) :均勻分布，最常用的。Scale就是均勻分布的每個數據在-scale~scale之間。此處就是-0.05~0.05。scale默認值是0.05；
lecun_uniform:是在LeCun在98年發表的論文中基於uniform的一種方法。區別就是lecun_uniform的scale=sqrt(3/f_in)。f_in就是待初始化權值矩陣的行。
normal：正態分布（高斯分布)。
Identity ：用於2維方陣，返回一個單位陣.
Orthogonal：用於2維方陣，返回一個正交矩陣. lstm默認
Zero：產生一個全0矩陣。
glorot_normal：基於normal分布，normal的默認 sigma^2=scale=0.05，而此處sigma^2=scale=sqrt(2 / (f_in+ f_out))，其中，f_in和f_out是待初始化矩陣的行和列。
glorot_uniform：基於uniform分布，uniform的默認scale=0.05，而此處scale=sqrt( 6 / (f_in +f_out)) ，其中，f_in和f_out是待初始化矩陣的行和列。
W_regularizer , b_regularizer and activity_regularizer:
官方文檔: http://keras.io/regularizers/
from keras.regularizers import l2, activity_l2
model.add(Dense(64, input_dim=64, W_regularizer=l2(0.01), activity_regularizer=activity_l2(0.01)))
加入規則項主要是為了在小樣本數據下過擬合現象的發生,我們都知道,一半在訓練過程中解決過擬合現象的方法主要中兩種,一種是加入規則項(權值衰減), 第二種是加大數據量
很顯然,加大數據量一般是不容易的,而加入規則項則比較容易,所以在發生過擬合的情況下,我們一般都采用加入規則項來解決這個問題.
'''
model.add(LSTM(hidden_size, input_shape=(maxlen, len(chars)))) #(7,12) 輸入層
'''''
keras.layers.core.RepeatVector(n)
把1維的輸入重復n次。假設輸入維度為(nb_samples, dim)，那么輸出shape就是(nb_samples, n, dim)
inputshape: 任意。當把這層作為某個模型的第一層時，需要用到該參數（元組，不包含樣本軸）。
outputshape：(nb_samples,nb_input_units)
'''
model.add(RepeatVector(single_digit + 1))
#表示有多少個隱含層
for _ in range(layers):
model.add(LSTM(hidden_size, return_sequences=True))
'''''
TimeDistributedDense:
官方文檔:http://keras.io/layers/core/#timedistributeddense
keras.layers.core.TimeDistributedDense(output_dim,init='glorot_uniform', activation='linear', weights=None
W_regularizer=None, b_regularizer=None, activity_regularizer=None, W_constraint=None, b_constraint=None,
input_dim=None, input_length=None)
這是一個基於時間維度的全連接層。主要就是用來構建RNN(遞歸神經網絡)的，但是在構建RNN時需要設置return_sequences=True。
for example:
# input shape: (nb_samples, timesteps,10)
model.add(LSTM(5, return_sequences=True, input_dim=10)) # output shape: (nb_samples, timesteps, 5)
model.add(TimeDistributedDense(15)) # output shape:(nb_samples, timesteps, 15)
W_constraint:
from keras.constraints import maxnorm
model.add(Dense(64, W_constraint =maxnorm(2))) #限制權值的各個參數不能大於2
'''
model.add(TimeDistributedDense(len(chars)))
model.add(Activation('softmax'))
'''''
關於目標函數和優化函數,參考另外一片博文: http://blog.csdn.net/zjm750617105/article/details/51321915
'''
model.compile(loss='categorical_crossentropy',
optimizer='adam',
metrics=['accuracy'])
# Train the model each generation and show predictions against the validation dataset
for iteration in range(1, 3):
print()
print('-' * 50)
print('Iteration', iteration)
model.fit(inputs, labels, batch_size=batch_size, nb_epoch=2,
validation_split = 0.1)
# Select 10 samples from the validation set at random so we can visualize errors
model.get_config()

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 LSTM網絡中各層解讀詳細解讀Android中的搜索框（一）—— 簡單小例子 Selenium系列（三） - 詳細解讀針對元素常見的簡單操作 Paxos協議超級詳細解釋+簡單實例 LSTM簡單入門 Tensorflow RNN_LSTM實例 AlexNet詳細解讀 Nodejs oracledb詳細解讀 MemCache超詳細解讀 JMX超詳細解讀