課程五(Sequence Models)，第一周（Recurrent Neural Networks） —— 2.Programming assignments：Dinosaur Island - Character-Level Language Modeling

本文轉載自查看原文 2018-03-01 15:56 3902 【AI算法進階 ● 深度學習 ● AndrewNg 課程筆記】

Character level language model - Dinosaurus land

Welcome to Dinosaurus Island! 65 million years ago, dinosaurs existed, and in this assignment they are back. You are in charge of a special task. Leading biology researchers are creating new breeds of dinosaurs and bringing them to life on earth, and your job is to give names to these dinosaurs. If a dinosaur does not like its name, it might go beserk, so choose wisely!

Luckily you have learned some deep learning and you will use it to save the day. Your assistant has collected a list of all the dinosaur names they could find, and compiled them into this dataset. (Feel free to take a look by clicking the previous link.) To create new dinosaur names, you will build a character level language model to generate new names. Your algorithm will learn the different name patterns, and randomly generate new names. Hopefully this algorithm will keep you and your team safe from the dinosaurs' wrath!

By completing this assignment you will learn:

How to store text data for processing using an RNN
How to synthesize data, by sampling predictions at each time step and passing it to the next RNN-cell unit
How to build a character-level text generation recurrent neural network
Why clipping the gradients is important

【中文翻譯】

字符級語言模型- 恐龍島

歡迎來到恐龍島!6500萬年前, 恐龍還存在, 在這個任務中, 他們又回來了。你負責一項特別任務。領先的生物學研究人員正在創造新品種的恐龍, 並把它們帶到地球上, 你的工作是給這些恐龍命名。如果恐龍不喜歡它的名字, 它可能會 beserk, 所以請明智地選擇!

幸運的是, 你已經學會了深度學習, 你會用它來拯救我們的生活。您的助手收集了他們可以找到的所有恐龍名稱的列表, 並將它們編譯成這個數據集。(通過單擊鏈接可以隨意查看。要創建新的恐龍名稱, 您將構建一個字符級語言模型來生成新名稱。您的算法將學習不同的名稱模式, 並隨機生成新名稱。希望這個算法能讓你和你的團隊免於恐龍的憤怒!

通過完成此任務, 您將了解到:

如何存儲使用 RNN 處理的文本數據
如何合成數據, 通過在每個時間步驟抽樣預測結果並傳遞給下一個 RNN 單元
如何構建字符級文本生成的遞歸神經網絡
為什么剪切梯度很重要

We will begin by loading in some functions that we have provided for you in rnn_utils. Specifically, you have access to functions such as rnn_forward and rnn_backward which are equivalent to those you've implemented in the previous assignment.

【code】

import numpy as np
from utils import *
import random

1 - Problem Statement

1.1 - Dataset and Preprocessing

Run the following cell to read the dataset of dinosaur names, create a list of unique characters (such as a-z), and compute the dataset and vocabulary size.

【code】

data = open('dinos.txt', 'r').read()
data= data.lower()
chars = list(set(data))
data_size, vocab_size = len(data), len(chars)
print('There are %d total characters and %d unique characters in your data.' % (data_size, vocab_size))[]

【result】

There are 19909 total characters and 27 unique characters in your data.

The characters are a-z (26 characters) plus the "\n" (or newline character), which in this assignment plays a role similar to the <EOS> (or "End of sentence") token we had discussed in lecture, only here it indicates the end of the dinosaur name rather than the end of a sentence. In the cell below, we create a python dictionary (i.e., a hash table) to map each character to an index from 0-26. We also create a second python dictionary that maps each index back to the corresponding character . This will help you figure out what index corresponds to what character in the probability distribution output of the softmax layer. Below, char_to_ix and ix_to_char are the python dictionaries.

【code】

char_to_ix = { ch:i for i,ch in enumerate(sorted(chars)) }
ix_to_char = { i:ch for i,ch in enumerate(sorted(chars)) }
print(ix_to_char)

【result】

{0: '\n', 1: 'a', 2: 'b', 3: 'c', 4: 'd', 5: 'e', 6: 'f', 7: 'g', 8: 'h', 9: 'i', 10: 'j', 11: 'k', 12: 'l', 13: 'm', 14: 'n', 15: 'o', 16: 'p', 17: 'q', 18: 'r', 19: 's', 20: 't', 21: 'u', 22: 'v', 23: 'w', 24: 'x', 25: 'y', 26: 'z'}

【中文翻譯】　　

1.1-數據集和預處理

運行以下單元格以讀取恐龍名稱的數據集, 創建唯一字符 (如 a-z) 的列表, 並計算數據集和詞匯大小。

【code】

data = open('dinos.txt', 'r').read()
data= data.lower()
chars = list(set(data))
data_size, vocab_size = len(data), len(chars)
print('There are %d total characters and %d unique characters in your data.' % (data_size, vocab_size))[]

【result】

There are 19909 total characters and 27 unique characters in your data.

字符是 a-z (26 個字符) 加上 "\n" (或換行符), "\n" 在這個任務中扮演一個角色, 類似於我們在演講中討論過的 <EOS> (或句子結尾) 標志, 在這里它表明恐龍名字的結束而不是句子的結束。.在下面的單元格中, 我們創建一個 python 字典 (即哈希表), 將每個字符映射為0-26 的索引。我們還創建了另一個 python 字典, 將每個索引映射回相應的字符。這將幫助您找出哪個索引對應於 softmax 層的概率分布輸出中的哪個字符。下面, char_to_ix 和 ix_to_char 是 python 字典。

【code】

char_to_ix = { ch:i for i,ch in enumerate(sorted(chars)) }
ix_to_char = { i:ch for i,ch in enumerate(sorted(chars)) }
print(ix_to_char)

【result】

{0: '\n', 1: 'a', 2: 'b', 3: 'c', 4: 'd', 5: 'e', 6: 'f', 7: 'g', 8: 'h', 9: 'i', 10: 'j', 11: 'k', 12: 'l', 13: 'm', 14: 'n', 15: 'o', 16: 'p', 17: 'q', 18: 'r', 19: 's', 20: 't', 21: 'u', 22: 'v', 23: 'w', 24: 'x', 25: 'y', 26: 'z'}

1.2 - Overview of the model

Your model will have the following structure:

Initialize parameters
Run the optimization loop
- Forward propagation to compute the loss function
- Backward propagation to compute the gradients with respect to the loss function
- Clip the gradients to avoid exploding gradients
- Using the gradients, update your parameter with the gradient descent update rule.
Return the learned parameters

Figure 1: Recurrent Neural Network, similar to what you had built in the previous notebook "Building a RNN - Step by Step".

At each time-step, the RNN tries to predict what is the next character given the previous characters. The dataset 　

2 - Building blocks of the model

In this part, you will build two important blocks of the overall model:

Gradient clipping: to avoid exploding gradients
Sampling: a technique used to generate characters

You will then apply these two functions to build the model.

2.1 - Clipping the gradients in the optimization loop

In this section you will implement the clip function that you will call inside of your optimization loop. Recall that your overall loop structure usually consists of a forward pass, a cost computation, a backward pass, and a parameter update. Before updating the parameters, you will perform gradient clipping when needed to make sure that your gradients are not "exploding," meaning taking on overly large values.

In the exercise below, you will implement a function clip that takes in a dictionary of gradients and returns a clipped version of gradients if needed. There are different ways to clip gradients; we will use a simple element-wise clipping procedure, in which every element of the gradient vector is clipped to lie between some range [-N, N]. More generally, you will provide a maxValue (say 10). In this example, if any component of the gradient vector is greater than 10, it would be set to 10; and if any component of the gradient vector is less than -10, it would be set to -10. If it is between -10 and 10, it is left alone.

【中文翻譯】

2.1-在優化循環中進行梯度修剪

在本節中, 您將實現將在優化循環內調用的clip函數。請記住, 整個循環結構通常由前向傳播、成本計算、反向傳播和參數更新組成。在更新參數之前, 您將在需要時執行梯度修剪, 以確保梯度不 "爆炸", 這意味着對超大值進行處理。

在下面的練習中, 您將實現一個函數clip, 該clip對梯度的字典進行處理, 如果需要, 返回梯度的修剪版本。有不同的方法來修剪梯度;我們將使用一個簡單的element-wise程序, 其中的每個元素的梯度向量被修剪在一些范圍 [-N, N]。更籠統地說, 你將提供一個 maxValue (比方說 10)。在這個例子中, 如果梯度向量的任何分量大於 10, 它將被設置為 10;如果梯度向量的任何分量小於-10, 它將被設置為-10。如果它是在-10 和10之間, 它是單獨留下的。

Figure 2: Visualization of gradient descent with and without gradient clipping, in a case where the network is running into slight "exploding gradient" problems.

Exercise: Implement the function below to return the clipped gradients of your dictionary gradients. Your function takes in a maximum threshold and returns the clipped versions of your gradients. You can check out this hint for examples of how to clip in numpy. You will need to use the argument out = ....

【code】

### GRADED FUNCTION: clip

def clip(gradients, maxValue):
    '''
    Clips the gradients' values between minimum and maximum.
    
    Arguments:
    gradients -- a dictionary containing the gradients "dWaa", "dWax", "dWya", "db", "dby"
    maxValue -- everything above this number is set to this number, and everything less than -maxValue is set to -maxValue
    
    Returns: 
    gradients -- a dictionary with the clipped gradients.
    '''
    
    dWaa, dWax, dWya, db, dby = gradients['dWaa'], gradients['dWax'], gradients['dWya'], gradients['db'], gradients['dby']
   
    ### START CODE HERE ###
    # clip to mitigate(減少) exploding gradients, loop over [dWax, dWaa, dWya, db, dby]. (≈2 lines)
    for gradient in [dWax, dWaa, dWya, db, dby]:
        np.clip(gradient,-maxValue,maxValue,gradient)   # numpy.clip(a, a_min, a_max, out=None)
    ### END CODE HERE ###
    
    gradients = {"dWaa": dWaa, "dWax": dWax, "dWya": dWya, "db": db, "dby": dby}
    
    return gradients

np.random.seed(3)
dWax = np.random.randn(5,3)*10
dWaa = np.random.randn(5,5)*10
dWya = np.random.randn(2,5)*10
db = np.random.randn(5,1)*10
dby = np.random.randn(2,1)*10
gradients = {"dWax": dWax, "dWaa": dWaa, "dWya": dWya, "db": db, "dby": dby}
gradients = clip(gradients, 10)
print("gradients[\"dWaa\"][1][2] =", gradients["dWaa"][1][2])
print("gradients[\"dWax\"][3][1] =", gradients["dWax"][3][1])
print("gradients[\"dWya\"][1][2] =", gradients["dWya"][1][2])
print("gradients[\"db\"][4] =", gradients["db"][4])
print("gradients[\"dby\"][1] =", gradients["dby"][1])

【result】　　

gradients["dWaa"][1][2] = 10.0
gradients["dWax"][3][1] = -10.0
gradients["dWya"][1][2] = 0.29713815361
gradients["db"][4] = [ 10.]
gradients["dby"][1] = [ 8.45833407]

【Expected output】　　

gradients["dWaa"][1][2]	10.0
gradients["dWax"][3][1]	-10.0
gradients["dWya"][1][2]	0.29713815361
gradients["db"][4]	[ 10.]
gradients["dby"][1]	[ 8.45833407]

2.2 - Sampling

Now assume that your model is trained. You would like to generate new text (characters). The process of generation is explained in the picture below:

Figure 3: In this picture, we assume the model is already trained. We pass in

Exercise: Implement the sample function below to sample characters. You need to carry out 4 steps:

Step 1: Pass the network the first "dummy" input

Step 2: Run one step of forward propagation to get

Note that

Step 3: Carry out sampling: Pick the next character's index according to the probability distribution specified by

Here is an example of how to use np.random.choice():

np.random.seed(0)
p = np.array([0.1, 0.0, 0.7, 0.2])
index = np.random.choice([0, 1, 2, 3], p = p.ravel())

This means that you will pick the index according to the distribution: 　　

Step 4: The last step to implement in sample() is to overwrite the variable x, which currently stores

【中文翻譯】

練習: 在下面執行sample 函數以采樣字符。你需要執行4步:

步驟 1: 給網絡輸入第一個 "虛擬" 向量x^⟨¹^⟩=0⃗ (零向量)。這是在生成任何字符之前的默認輸入。我們還設置了a^⟨⁰^⟩=0⃗

步驟 2: 運行向前傳播一步以獲得 a^⟨1⟩和

。下面是等式:

請注意,

is a (softmax) 概率向量 (其值介於0和1之間, 總和為 1)。 ŷ _i^⟨^t⁺¹^⟩ 代表由 "i" 索引的字符是下一個字符的概率。我們提供了一個 softmax () 函數, 您可以使用它。

步驟 3: 進行抽樣: 根據y ̂ ^⟨t+1⟩指定的概率分布, 選取下一個字符的索引。這意味着, 如果y ̂ _i^⟨^t⁺¹^⟩i= 0.16, 您將選擇索引 "i" 以16% 的可能性。要實現它, 您可以使用 np.random.choice。

下面是一個如何使用 np.random.choice () 的示例:

np.random.seed(0)
p = np.array([0.1, 0.0, 0.7, 0.2])
index = np.random.choice([0, 1, 2, 3], p = p.ravel())

這意味着您將根據分布選擇索引: P(index=0)=0.1,P(index=1)=0.0,P(index=2)=0.7,P(index=3)=0.2。　　

步驟 4: 在sample()中實現的最后一個步驟是覆蓋當前存儲 x ^⟨t⟩的變量 x, 其值為 x ^{⟨t+1 ⟩}。您將代表 x ^{⟨t+1 ⟩}，通過創建一個與您選擇的字符相對應的one-hot向量。然后, 您將在步驟1前向傳播 x ^{⟨t+1 ⟩}並繼續重復該過程, 直到您得到一個 "\n" 字符, 表明您已經達到恐龍名稱的末尾。

【code】

# GRADED FUNCTION: sample

def sample(parameters, char_to_ix, seed):
    """
    Sample a sequence of characters according to a sequence of probability distributions output of the RNN

    Arguments:
    parameters -- python dictionary containing the parameters Waa, Wax, Wya, by, and b. 
    char_to_ix -- python dictionary mapping each character to an index.
    seed -- used for grading purposes. Do not worry about it.

    Returns:
    indices -- a list of length n containing the indices of the sampled characters.
    """
    
    # Retrieve parameters and relevant shapes from "parameters" dictionary
    Waa, Wax, Wya, by, b = parameters['Waa'], parameters['Wax'], parameters['Wya'], parameters['by'], parameters['b']
    vocab_size = by.shape[0]  # vocab_size 指字典的大小
    n_a = Waa.shape[1]
    
    
    ### START CODE HERE ###
    # Step 1: Create the one-hot vector x for the first character (initializing the sequence generation). (≈1 line)
    x = np.zeros((vocab_size,1))   # x 是 one hot 向量
    # Step 1': Initialize a_prev as zeros (≈1 line)
    a_prev = np.zeros((n_a,1))   # a_prev 是 (n_a,1) 維向量
    
    # Create an empty list of indices, this is the list which will contain the list of indices of the characters to generate (≈1 line)
    indices = []
    
    # Idx is a flag to detect a newline character, we initialize it to -1
    idx = -1 
    
    # Loop over time-steps t. At each time-step, sample a character from a probability distribution and append 
    # its index to "indices". We'll stop if we reach 50 characters (which should be very unlikely with a well 
    # trained model), which helps debugging and prevents entering an infinite loop. 
    counter = 0
    newline_character = char_to_ix['\n']
    
    while (idx != newline_character and counter != 50):
        
        # Step 2: Forward propagate x using the equations (1), (2) and (3)
        a = np.tanh(np.dot(Wax,x)+np.dot(Waa,a_prev)+b)
        z = np.dot(Wya,a)+by
        y = softmax(z)
        
        # for grading purposes
        np.random.seed(counter+seed) 
        
        # Step 3: Sample the index of a character within the vocabulary from the probability distribution y
        # np.arange(vocab_size) 返回一個一維數組，即[0,1,...,vocab_size]
        #  np.random.choice(vocab_size,p=y.ravel())  等價於 np.random.choice([0,1,...,vocab_size],p=y.ravel())
        idx = np.random.choice(vocab_size,p=y.ravel())  

        # Append the index to "indices"
        indices.append(idx)
        
        # Step 4: Overwrite the input character as the one corresponding to the sampled index.
        # 根據取樣索引值修改x,即將索引對應的one hot向量的位置值改為1
        x = np.zeros((vocab_size,1))
        x[idx] = 1
        
        # Update "a_prev" to be "a"
        a_prev = a
        
        # for grading purposes
        seed += 1
        counter +=1
        
    ### END CODE HERE ###

    if (counter == 50):
        indices.append(char_to_ix['\n'])
    
    return indices

np.random.seed(2)
_, n_a = 20, 100
Wax, Waa, Wya = np.random.randn(n_a, vocab_size), np.random.randn(n_a, n_a), np.random.randn(vocab_size, n_a)
b, by = np.random.randn(n_a, 1), np.random.randn(vocab_size, 1)
parameters = {"Wax": Wax, "Waa": Waa, "Wya": Wya, "b": b, "by": by}


indices = sample(parameters, char_to_ix, 0)
print("Sampling:")
print("list of sampled indices:", indices)
print("list of sampled characters:", [ix_to_char[i] for i in indices])

【reuslt】

Sampling:
list of sampled indices: [12, 17, 24, 14, 13, 9, 10, 22, 24, 6, 13, 11, 12, 6, 21, 15, 21, 14, 3, 2, 1, 21, 18, 24, 7, 25, 6, 25, 18, 10, 16, 2, 3, 8, 15, 12, 11, 7, 1, 12, 10, 2, 7, 7, 11, 5, 6, 12, 25, 0, 0]
list of sampled characters: ['l', 'q', 'x', 'n', 'm', 'i', 'j', 'v', 'x', 'f', 'm', 'k', 'l', 'f', 'u', 'o', 'u', 'n', 'c', 'b', 'a', 'u', 'r', 'x', 'g', 'y', 'f', 'y', 'r', 'j', 'p', 'b', 'c', 'h', 'o', 'l', 'k', 'g', 'a', 'l', 'j', 'b', 'g', 'g', 'k', 'e', 'f', 'l', 'y', '\n', '\n']

【Expected output】　　

list of sampled indices:	[12, 17, 24, 14, 13, 9, 10, 22, 24, 6, 13, 11, 12, 6, 21, 15, 21, 14, 3, 2, 1, 21, 18, 24, 
7, 25, 6, 25, 18, 10, 16, 2, 3, 8, 15, 12, 11, 7, 1, 12, 10, 2, 7, 7, 11, 5, 6, 12, 25, 0, 0]
list of sampled characters:	['l', 'q', 'x', 'n', 'm', 'i', 'j', 'v', 'x', 'f', 'm', 'k', 'l', 'f', 'u', 'o', 
'u', 'n', 'c', 'b', 'a', 'u', 'r', 'x', 'g', 'y', 'f', 'y', 'r', 'j', 'p', 'b', 'c', 'h', 'o', 
'l', 'k', 'g', 'a', 'l', 'j', 'b', 'g', 'g', 'k', 'e', 'f', 'l', 'y', '\n', '\n']

3 - Building the language model

It is time to build the character-level language model for text generation.

3.1 - Gradient descent

In this section you will implement a function performing one step of stochastic gradient descent (with clipped gradients). You will go through the training examples one at a time, so the optimization algorithm will be stochastic gradient descent. As a reminder, here are the steps of a common optimization loop for an RNN:

Forward propagate through the RNN to compute the loss
Backward propagate through time to compute the gradients of the loss with respect to the parameters
Clip the gradients if necessary
Update your parameters using gradient descent

【中文翻譯】

3.1-梯度下降

在本節中, 您將實現一個函數, 執行隨機梯度下降的一個步驟 (帶有梯度修剪)。您將一次采用一個訓練樣本訓練, 因此優化算法將是隨機梯度下降。作為提醒, 下面是 RNN 的通用優化循環的步驟:

通過 RNN 向前傳播計算損失
通過時間反向傳播計算損失對於關於參數的梯度
必要時修剪梯度
使用漸梯度下降更新參數

Exercise: Implement this optimization process (one step of stochastic gradient descent).　　

def rnn_forward(X, Y, a_prev, parameters):
    """ Performs the forward propagation through the RNN and computes the cross-entropy loss.
    It returns the loss' value as well as a "cache" storing values to be used in the backpropagation."""
    ....
    return loss, cache

def rnn_backward(X, Y, parameters, cache):
    """ Performs the backward propagation through time to compute the gradients of the loss with respect
    to the parameters. It returns also all the hidden states."""
    ...
    return gradients, a

def update_parameters(parameters, gradients, learning_rate):
    """ Updates parameters using the Gradient Descent Update Rule."""
    ...
    return parameters

【code】

# GRADED FUNCTION: optimize

def optimize(X, Y, a_prev, parameters, learning_rate = 0.01):
    """
    Execute one step of the optimization to train the model.
    
    Arguments:
    X -- list of integers, where each integer is a number that maps to a character in the vocabulary.
    Y -- list of integers, exactly the same as X but shifted one index to the left.(整數列表, 與 X 完全相同, 但向左移動了一個索引)
    a_prev -- previous hidden state.
    parameters -- python dictionary containing:
                        Wax -- Weight matrix multiplying the input, numpy array of shape (n_a, n_x)
                        Waa -- Weight matrix multiplying the hidden state, numpy array of shape (n_a, n_a)
                        Wya -- Weight matrix relating the hidden-state to the output, numpy array of shape (n_y, n_a)
                        b --  Bias, numpy array of shape (n_a, 1)
                        by -- Bias relating the hidden-state to the output, numpy array of shape (n_y, 1)
    learning_rate -- learning rate for the model.
    
    Returns:
    loss -- value of the loss function (cross-entropy)
    gradients -- python dictionary containing:
                        dWax -- Gradients of input-to-hidden weights, of shape (n_a, n_x)
                        dWaa -- Gradients of hidden-to-hidden weights, of shape (n_a, n_a)
                        dWya -- Gradients of hidden-to-output weights, of shape (n_y, n_a)
                        db -- Gradients of bias vector, of shape (n_a, 1)
                        dby -- Gradients of output bias vector, of shape (n_y, 1)
    a[len(X)-1] -- the last hidden state, of shape (n_a, 1)
    """
    
    ### START CODE HERE ###
    
    # Forward propagate through time (≈1 line)
    loss, cache =rnn_forward(X, Y, a_prev, parameters)
    
    # Backpropagate through time (≈1 line)
    gradients, a = rnn_backward(X, Y, parameters, cache)
    
    # Clip your gradients between -5 (min) and 5 (max) (≈1 line)
    gradients =  clip(gradients,5)
    
    # Update parameters (≈1 line)
    parameters = update_parameters(parameters, gradients, learning_rate)
    
    ### END CODE HERE ###
    
    return loss, gradients, a[len(X)-1]

np.random.seed(1)
vocab_size, n_a = 27, 100
a_prev = np.random.randn(n_a, 1)
Wax, Waa, Wya = np.random.randn(n_a, vocab_size), np.random.randn(n_a, n_a), np.random.randn(vocab_size, n_a) # 輸入x是一個one hot向量，[1,vocab_size]
b, by = np.random.randn(n_a, 1), np.random.randn(vocab_size, 1)
parameters = {"Wax": Wax, "Waa": Waa, "Wya": Wya, "b": b, "by": by}
X = [12,3,5,11,22,3]
Y = [4,14,11,22,25, 26]

loss, gradients, a_last = optimize(X, Y, a_prev, parameters, learning_rate = 0.01)
print("Loss =", loss)
print("gradients[\"dWaa\"][1][2] =", gradients["dWaa"][1][2])
print("np.argmax(gradients[\"dWax\"]) =", np.argmax(gradients["dWax"]))
print("gradients[\"dWya\"][1][2] =", gradients["dWya"][1][2])
print("gradients[\"db\"][4] =", gradients["db"][4])
print("gradients[\"dby\"][1] =", gradients["dby"][1])
print("a_last[4] =", a_last[4])

【reuslt】

Loss = 126.503975722
gradients["dWaa"][1][2] = 0.194709315347
np.argmax(gradients["dWax"]) = 93
gradients["dWya"][1][2] = -0.007773876032
gradients["db"][4] = [-0.06809825]
gradients["dby"][1] = [ 0.01538192]
a_last[4] = [-1.]

【Expected output】　　

Loss	126.503975722
gradients["dWaa"][1][2]	0.194709315347
np.argmax(gradients["dWax"])	93
gradients["dWya"][1][2]	-0.007773876032
gradients["db"][4]	[-0.06809825]
gradients["dby"][1]	[ 0.01538192]
a_last[4]	[-1.]

3.2 - Training the model

Given the dataset of dinosaur names, we use each line of the dataset (one name) as one training example. Every 100 steps of stochastic gradient descent, you will sample 10 randomly chosen names to see how the algorithm is doing. Remember to shuffle the dataset, so that stochastic gradient descent visits the examples in random order.

【中文翻譯】

3.2-訓練模型

對於恐龍名稱的數據集, 我們使用數據集的每一行 (一個名稱) 作為一個訓練樣本。每100步隨機梯度下降, 你會抽樣10個隨機選擇的名稱, 以了解如何做的算法。記得洗牌數據集, 使隨機梯度下降以隨機順序訪問樣本。

Exercise: Follow the instructions and implement model(). When examples[index] contains one dinosaur name (string), to create an example (X, Y), you can use this:　　

index = j % len(examples)
X = [None] + [char_to_ix[ch] for ch in examples[index]] 
Y = X[1:] + [char_to_ix["\n"]]

Note that we use: index= j % len(examples), where j = 1....num_iterations, to make sure that examples[index] is always a valid statement (index is smaller than len(examples)). The first entry of X being None will be interpreted by rnn_forward() as setting 　　

【code】

# GRADED FUNCTION: model

def model(data, ix_to_char, char_to_ix, num_iterations = 35000, n_a = 50, dino_names = 7, vocab_size = 27):
    """
    Trains the model and generates dinosaur names. 
    
    Arguments:
    data -- text corpus
    ix_to_char -- dictionary that maps the index to a character
    char_to_ix -- dictionary that maps a character to an index
    num_iterations -- number of iterations to train the model for
    n_a -- number of units of the RNN cell
    dino_names -- number of dinosaur names you want to sample at each iteration. 
    vocab_size -- number of unique characters found in the text, size of the vocabulary
    
    Returns:
    parameters -- learned parameters
    """
    
    # Retrieve n_x and n_y from vocab_size
    n_x, n_y = vocab_size, vocab_size
    
    # Initialize parameters
    parameters = initialize_parameters(n_a, n_x, n_y)
    
    # Initialize loss (this is required because we want to smooth our loss, don't worry about it)
    loss = get_initial_loss(vocab_size, dino_names)
    
    # Build list of all dinosaur names (training examples).
    with open("dinos.txt") as f:
        examples = f.readlines()
    examples = [x.lower().strip() for x in examples]
    
    # Shuffle list of all dinosaur names
    np.random.seed(0)
    np.random.shuffle(examples)
    
    # Initialize the hidden state of your LSTM
    a_prev = np.zeros((n_a, 1))
    
    # Optimization loop
    for j in range(num_iterations):
        
        ### START CODE HERE ###
        
        # Use the hint above to define one training example (X,Y) (≈ 2 lines)
        index = j % len(examples)
        X = [None] + [char_to_ix[ch] for ch in examples[index]] 
        Y = X[1:] + [char_to_ix["\n"]]
        
        # Perform one optimization step: Forward-prop -> Backward-prop -> Clip -> Update parameters
        # Choose a learning rate of 0.01
        curr_loss, gradients, a_prev = optimize(X, Y, a_prev, parameters, learning_rate = 0.01)
        
        ### END CODE HERE ###
        
        # Use a latency trick（延時技巧） to keep the loss smooth. It happens here to accelerate the training.
        loss = smooth(loss, curr_loss)

        # Every 2000 Iteration, generate "n" characters thanks to sample() to check if the model is learning properly
        if j % 2000 == 0:
            
            print('Iteration: %d, Loss: %f' % (j, loss) + '\n')
            
            # The number of dinosaur names to print
            seed = 0
            for name in range(dino_names):
                
                # Sample indices and print them
                sampled_indices = sample(parameters, char_to_ix, seed) # 采樣一次，生成一個恐龍的名字
                print_sample(sampled_indices, ix_to_char)
                
                seed += 1  # To get the same result for grading purposed, increment the seed by one. 
      
            print('\n')
        
    return parameters

Run the following cell, you should observe your model outputting random-looking characters at the first iteration. After a few thousand iterations, your model should learn to generate reasonable-looking names.　　

【code】

parameters = model(data, ix_to_char, char_to_ix)

【result】

Iteration: 0, Loss: 23.087336

Nkzxwtdmfqoeyhsqwasjkjvu
Kneb
Kzxwtdmfqoeyhsqwasjkjvu
Neb
Zxwtdmfqoeyhsqwasjkjvu
Eb
Xwtdmfqoeyhsqwasjkjvu


Iteration: 2000, Loss: 27.884160

Liusskeomnolxeros
Hmdaairus
Hytroligoraurus
Lecalosapaus
Xusicikoraurus
Abalpsamantisaurus
Tpraneronxeros


Iteration: 4000, Loss: 25.901815

Mivrosaurus
Inee
Ivtroplisaurus
Mbaaisaurus
Wusichisaurus
Cabaselachus
Toraperlethosdarenitochusthiamamumamaon


Iteration: 6000, Loss: 24.608779

Onwusceomosaurus
Lieeaerosaurus
Lxussaurus
Oma
Xusteonosaurus
Eeahosaurus
Toreonosaurus


Iteration: 8000, Loss: 24.070350

Onxusichepriuon
Kilabersaurus
Lutrodon
Omaaerosaurus
Xutrcheps
Edaksoje
Trodiktonus


Iteration: 10000, Loss: 23.844446

Onyusaurus
Klecalosaurus
Lustodon
Ola
Xusodonia
Eeaeosaurus
Troceosaurus


Iteration: 12000, Loss: 23.291971

Onyxosaurus
Kica
Lustrepiosaurus
Olaagrraiansaurus
Yuspangosaurus
Eealosaurus
Trognesaurus


Iteration: 14000, Loss: 23.382339

Meutromodromurus
Inda
Iutroinatorsaurus
Maca
Yusteratoptititan
Ca
Troclosaurus


Iteration: 16000, Loss: 23.288447

Meuspsangosaurus
Ingaa
Iusosaurus
Macalosaurus
Yushanis
Daalosaurus
Trpandon


Iteration: 18000, Loss: 22.823526

Phytrolonhonyg
Mela
Mustrerasaurus
Peg
Ytronorosaurus
Ehalosaurus
Trolomeehus


Iteration: 20000, Loss: 23.041871

Nousmofonosaurus
Loma
Lytrognatiasaurus
Ngaa
Ytroenetiaudostarmilus
Eiafosaurus
Troenchulunosaurus


Iteration: 22000, Loss: 22.728849

Piutyrangosaurus
Midaa
Myroranisaurus
Pedadosaurus
Ytrodon
Eiadosaurus
Trodoniomusitocorces


Iteration: 24000, Loss: 22.683403

Meutromeisaurus
Indeceratlapsaurus
Jurosaurus
Ndaa
Yusicheropterus
Eiaeropectus
Trodonasaurus


Iteration: 26000, Loss: 22.554523

Phyusaurus
Liceceron
Lyusichenodylus
Pegahus
Yustenhtonthosaurus
Elagosaurus
Trodontonsaurus


Iteration: 28000, Loss: 22.484472

Onyutimaerihus
Koia
Lytusaurus
Ola
Ytroheltorus
Eiadosaurus
Trofiashates


Iteration: 30000, Loss: 22.774404

Phytys
Lica
Lysus
Pacalosaurus
Ytrochisaurus
Eiacosaurus
Trochesaurus


Iteration: 32000, Loss: 22.209473

Mawusaurus
Jica
Lustoia
Macaisaurus
Yusolenqtesaurus
Eeaeosaurus
Trnanatrax


Iteration: 34000, Loss: 22.396744

Mavptokekus
Ilabaisaurus
Itosaurus
Macaesaurus
Yrosaurus
Eiaeosaurus
Trodon

Conclusion

You can see that your algorithm has started to generate plausible dinosaur names towards the end of the training. At first, it was generating random characters, but towards the end you could see dinosaur names with cool endings. Feel free to run the algorithm even longer and play with hyperparameters to see if you can get even better results. Our implemetation generated some really cool names like maconucon, marloralus and macingsersaurus. Your model hopefully also learned that dinosaur names tend to end in saurus, don, aura, tor, etc.

If your model generates some non-cool names, don't blame the model entirely--not all actual dinosaur names sound cool. (For example, dromaeosauroides is an actual dinosaur name and is in the training set.) But this model should give you a set of candidates from which you can pick the coolest!

This assignment had used a relatively small dataset, so that you could train an RNN quickly on a CPU. Training a model of the english language requires a much bigger dataset, and usually needs much more computation, and could run for many hours on GPUs. We ran our dinosaur name for quite some time, and so far our favoriate name is the great, undefeatable, and fierce: Mangosaurus!

【中文翻譯】　　

結論

你可以看到, 你的算法已經開始生成合理的恐龍名稱, 在訓練結束。起初, 它是產生隨機字符, 但到最后, 你可以看到恐龍的名字。可以自由地運行算法更長的時間, 修改 hyperparameters, 看看你是否能得到更好的結果。我們的實現產生了一些非常酷的名字, 如 maconucon, marloralus 和 macingsersaurus。希望你的模型也知道恐龍的名字，這些名字往往以saurus, don, aura, tor等結束。

如果你的模型產生一些不酷的名字,完全不要責怪模型-不是所有實際的恐龍名字聽起來很酷。(例如, dromaeosauroides 是一個實際的恐龍名稱, 並且在訓練集中。但這個模型應該給你一組候選名單, 你可以選擇最酷的!

此任務使用了相對較小的數據集, 以便您可以在 CPU 上快速地訓練 RNN。訓練英語語言模型需要更大的數據集, 通常需要更多的計算, 並且有可能在 GPUs 上運行許多小時。我們的恐龍名字運行了相當一段時間, 到目前為止, 我們的愛好名字是偉大的, 無敵, 和激烈的: Mangosaurus!

4 - Writing like Shakespeare

The rest of this notebook is optional and is not graded, but we hope you'll do it anyway since it's quite fun and informative.

A similar (but more complicated) task is to generate Shakespeare poems. Instead of learning from a dataset of Dinosaur names you can use a collection of Shakespearian poems. Using LSTM cells, you can learn longer term dependencies that span many characters in the text--e.g., where a character appearing somewhere a sequence can influence what should be a different character much much later in ths sequence. These long term dependencies were less important with dinosaur names, since the names were quite short.

【中文翻譯】

4-像莎士比亞一樣寫文章

本筆記本的其余部分是可選的, 並沒有評分, 但我們希望你會這樣做, 因為它是相當有趣和信息的。

類似的 (但更復雜的) 任務是生成莎士比亞的詩歌。這次不是從恐龍名字的數據集學習, 而是你可以使用莎士比亞詩的集合。使用 LSTM 單元格, 您可以學習跨越文本中許多字符的更長的長期依賴關系, 例如, 在某個序列中出現的某個字符可能會影響該序列中稍后的不同字符。對於恐龍名字，這些長期依賴性不太重要與, 因為名字是相當短的

We have implemented a Shakespeare poem generator with Keras. Run the following cell to load the required packages and models. This may take a few minutes.

【code】

from __future__ import print_function
from keras.callbacks import LambdaCallback
from keras.models import Model, load_model, Sequential
from keras.layers import Dense, Activation, Dropout, Input, Masking
from keras.layers import LSTM
from keras.utils.data_utils import get_file
from keras.preprocessing.sequence import pad_sequences
from shakespeare_utils import *
import sys
import io

【result】

Loading text data...
Creating training set...
number of training examples: 31412
Vectorizing training set...
Loading model...

To save you some time, we have already trained a model for ~1000 epochs on a collection of Shakespearian poems called "The Sonnets".

Let's train the model for one more epoch. When it finishes training for an epoch---this will also take a few minutes---you can run generate_output, which will prompt asking you for an input (<40 characters). The poem will start with your sentence, and our RNN-Shakespeare will complete the rest of the poem for you! For example, try "Forsooth this maketh no sense " (don't enter the quotation marks). Depending on whether you include the space at the end, your results might also differ--try it both ways, and try other inputs as well.

【code】

print_callback = LambdaCallback(on_epoch_end=on_epoch_end)

model.fit(x, y, batch_size=128, epochs=1, callbacks=[print_callback])

【result】　　

Epoch 1/1
31412/31412 [==============================] - 213s - loss: 2.5632

<keras.callbacks.History at 0x7f5469add400>

【code】

# Run this cell to try with different inputs without having to re-train the model 
generate_output()

【result】　　

Write the beginning of your poem, the Shakespeare machine will complete it. Your input is: Forsooth this maketh no sense


Here is your poem: 

Forsooth this maketh no sense,
phore sanrel maspy to danciging,
and make that woer oh (treased's from fro ly.
if least to me the suffertife of feer by caosed,
hid trolse fritce dedibe the word the miget,
buf my leass were comfoss that in thou hant'st gaod,
his shade the wilf thit whete spool my sade.
cince switt wat pen swalce on thee thee de to yout chasse?
bes it she might all most do thi ale agay.
but lose my 'stain shull

The RNN-Shakespeare model is very similar to the one you have built for dinosaur names. The only major differences are:

LSTMs instead of the basic RNN to capture longer-range dependencies
The model is a deeper, stacked LSTM model (2 layer)
Using Keras instead of python to simplify the code

If you want to learn more, you can also check out the Keras Team's text generation implementation on GitHub: https://github.com/keras-team/keras/blob/master/examples/lstm_text_generation.py.

Congratulations on finishing this notebook!

References:

This exercise took inspiration from Andrej Karpathy's implementation: https://gist.github.com/karpathy/d4dee566867f8291f086. To learn more about text generation, also check out Karpathy's blog post.
For the Shakespearian poem generator, our implementation was based on the implementation of an LSTM text generator by the Keras team: https://github.com/keras-team/keras/blob/master/examples/lstm_text_generation.py

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 課程五(Sequence Models)，第一周（Recurrent Neural Networks） —— 3.Programming assignments：Jazz improvisation with LSTM 課程四(Convolutional Neural Networks)，第一周（Foundations of Convolutional Neural Networks） —— 2.Programming assignments：Convolutional Model: step by step 課程四(Convolutional Neural Networks)，第二周（Deep convolutional models: case studies） —— 2.Programming assignments : Keras Tutorial - The Happy House (not graded) 課程五(Sequence Models)，第三周（Sequence models & Attention mechanism） —— 1.Programming assignments：Neural Machine Translation with Attention 課程一(Neural Networks and Deep Learning)，第四周（Deep Neural Networks）——2.Programming Assignments: Building your Deep Neural Network: Step by Step 課程二(Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization)，第三周（Hyperparameter tuning, Batch Normalization and Programming Frameworks） —— 2.Programming assignments 課程四(Convolutional Neural Networks)，第四周（Special applications: Face recognition & Neural style transfer） —— 3.Programming assignments：Face Recognition for the Happy House 課程一(Neural Networks and Deep Learning)，第二周（Basics of Neural Network programming）—— 1、10個測驗題（Neural Network Basics）課程四(Convolutional Neural Networks)，第一周（Foundations of Convolutional Neural Networks） —— 1.Practice questions：The basics of ConvNets 《The Unreasonable Effectiveness of Recurrent Neural Networks》閱讀筆記

課程五(Sequence Models)，第一 周（Recurrent Neural Networks） —— 2.Programming assignments：Dinosaur Island - Character-Level Language Modeling

Character level language model - Dinosaurus land

1 - Problem Statement

1.1 - Dataset and Preprocessing

1.2 - Overview of the model

2 - Building blocks of the model

2.1 - Clipping the gradients in the optimization loop

2.2 - Sampling

3 - Building the language model

3.1 - Gradient descent

3.2 - Training the model

Conclusion

4 - Writing like Shakespeare

免責聲明！

課程五(Sequence Models)，第一周（Recurrent Neural Networks） —— 2.Programming assignments：Dinosaur Island - Character-Level Language Modeling