tensorflow 2.0 學習（十四）循環神經網絡 IMDB數據集與RNN情感分類網絡

本文轉載自查看原文 2020-02-23 18:01 934 Deep Learning

網絡結構：

代碼如下：

  1 # encoding: utf-8
  2 import tensorflow as tf
  3 from tensorflow import keras
  4 from tensorflow.keras import layers, losses, optimizers, Model
  5 # from exam_rnn import MyRNN  # (可以分離寫)
  6 
  7 batchsz = 128  # 批量大小
  8 total_words = 10000  # 詞匯表大小N_vocab
  9 max_review_len = 80  # 句子最大長度s，大於的句子部分將截斷，小於的將填充
 10 embedding_len = 100  # 詞向量特征長度n
 11 # 加載IMDB 數據集，此處的數據采用數字編碼，一個數字代表一個單詞
 12 (x_train, y_train), (x_test, y_test) = keras.datasets.imdb.load_data(num_words=total_words)
 13 # 打印輸入的形狀，標簽的形狀
 14 print(x_train.shape, len(x_train[0]), y_train.shape)  # (25000,) 218 (25000,)
 15 print(x_test.shape, len(x_test[0]), y_test.shape)  # (25000,) 68 (25000,)
 16 
 17 '''# 查看編碼
 18 # 數字編碼表
 19 word_index = keras.datasets.imdb.get_word_index()
 20 # 打印出編碼表的單詞和對應的數字
 21 # for k,v in word_index.items():
 22 #     print(k,v)
 23 
 24 # 前面4 個ID 是特殊位
 25 word_index = {k: (v + 3) for k, v in word_index.items()}
 26 word_index["<PAD>"] = 0  # 填充標志
 27 word_index["<START>"] = 1  # 起始標志
 28 word_index["<UNK>"] = 2  # 未知單詞的標志
 29 word_index["<UNUSED>"] = 3
 30 # 翻轉編碼表
 31 reverse_word_index = dict([(value, key) for (key, value) in
 32                            word_index.items()])
 33 
 34 
 35 # 對於一個數字編碼的句子，通過如下函數轉換為字符串數據：
 36 def decode_review(text):
 37     return ' '.join([reverse_word_index.get(i, '?') for i in text])
 38 
 39 
 40 print(decode_review(x_train[0]))
 41 '''
 42 
 43 # 截斷和填充句子，使得等長，此處長句子保留句子后面的部分，短句子在前面填充
 44 x_train = keras.preprocessing.sequence.pad_sequences(x_train, maxlen=max_review_len)
 45 x_test = keras.preprocessing.sequence.pad_sequences(x_test, maxlen=max_review_len)
 46 
 47 # 構建數據集，打散，批量，並丟掉最后一個不夠batchsz 的batch
 48 db_train = tf.data.Dataset.from_tensor_slices((x_train, y_train))
 49 db_train = db_train.shuffle(1000).batch(batchsz, drop_remainder=True)
 50 db_test = tf.data.Dataset.from_tensor_slices((x_test, y_test))
 51 db_test = db_test.batch(batchsz, drop_remainder=True)
 52 
 53 print('x_train shape:', x_train.shape, tf.reduce_max(y_train), tf.reduce_min(y_train))
 54 # x_train shape: (25000, 80) tf.Tensor(1, shape=(), dtype=int64) tf.Tensor(0, shape=(), dtype=int64)
 55 print('x_test shape:', x_test.shape)
 56 # x_test shape: (25000, 80)
 57 
 58 
 59 class MyRNN(Model):
 60     # Cell 方式構建多層網絡
 61     def __init__(self, units):
 62         super(MyRNN, self).__init__()
 63         # [b, 64]，構建Cell 初始化狀態向量，重復使用
 64         self.state0 = [tf.zeros([batchsz, units])]  # 128,64
 65         self.state1 = [tf.zeros([batchsz, units])]  # 128,64
 66         # 詞向量編碼 [b, 80] => [b, 80, 100]
 67         self.embedding = layers.Embedding(total_words, embedding_len, input_length=max_review_len)
 68         # 構建2 個Cell，使用dropout 技術防止過擬合
 69         self.rnn_cell0 = layers.SimpleRNNCell(units)# , dropout=0.5)
 70         self.rnn_cell1 = layers.SimpleRNNCell(units)#, dropout=0.5)
 71         # 構建分類網絡，用於將CELL 的輸出特征進行分類，2 分類
 72         # [b, 80, 100] => [b, 64] => [b, 1]
 73         self.outlayer = layers.Dense(1)
 74 
 75     def call(self, inputs, training=None):
 76         x = inputs  # [128, 80]
 77         # 獲取詞向量: [128, 80] => [128, 80, 100]
 78         x = self.embedding(x)
 79         # 通過2 個RNN CELL,[128, 80, 100] => [128, 64]
 80         state0 = self.state0
 81         state1 = self.state1
 82         for word in tf.unstack(x, axis=1):  # word: [128, 100]
 83             out0, state0 = self.rnn_cell0(word, state0, training)
 84             out1, state1 = self.rnn_cell1(out0, state1, training)
 85 
 86         # 末層最后一個輸出作為分類網絡的輸入: [128, 64] => [128, 1]
 87         x = self.outlayer(out1)
 88         # 通過激活函數，p(y is pos|x)
 89         prob = tf.sigmoid(x)
 90         return prob
 91 
 92 
 93 def main():
 94     units = 64  # RNN 狀態向量長度n
 95     epochs = 20  # 訓練epochs
 96     model = MyRNN(units)  # 創建模型
 97     # 裝配
 98     model.compile(optimizer=optimizers.Adam(0.001),
 99                   loss=losses.BinaryCrossentropy(), metrics=['accuracy'],
100                   experimental_run_tf_function=False)
101     # 訓練和驗證
102     model.fit(db_train, epochs=epochs, validation_data=db_test)
103     # 測試
104     scores = model.evaluate(db_test)
105     print("Final test loss and accuracy :", scores)
106 
107 
108 if __name__ == '__main__':
109     main()

測試的誤差和准確率：

Final test loss and accuracy : [1.3201157276447002, 0.80188304]

下一次更新：LSTM情感分類問題

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 RNN與情感分類問題實戰-加載IMDB數據集 Tensorflow學習教程------利用卷積神經網絡對mnist數據集進行分類_訓練模型 Tensorflow學習教程------普通神經網絡對mnist數據集分類基於TensorFlow的循環神經網絡(RNN) Tensorflow暑期實踐——使用卷積神經網絡對CIFAR-10數據集進行分類 BP神經網絡用於Iris數據集的分類【tensorflow】神經網絡：自制數據集 TensorFlow 訓練MNIST數據集（2）—— 多層神經網絡 TensorFlow訓練MNIST數據集（3） —— 卷積神經網絡深度學習之循環神經網絡（RNN）

tensorflow 2.0 學習 （十四）循環神經網絡 IMDB數據集與RNN情感分類網絡

免責聲明！

tensorflow 2.0 學習（十四）循環神經網絡 IMDB數據集與RNN情感分類網絡