Keras 學習之旅(一)



軟件環境(Windows):

  • Visual Studio
  • Anaconda
  • CUDA
  • MinGW-w64
    • conda install -c anaconda mingw libpython
  • CNTK
  • TensorFlow-gpu
  • Keras-gpu
  • Theano
  • MKL
  • CuDNN

參考書籍:謝梁 , 魯穎 , 勞虹嵐.Keras快速上手:基於Python的深度學習實戰

Keras 簡介

Keras 這個名字來源於希臘古典史詩《奧德賽》的牛角之門(Gate of Horn):Those that come through the Ivory Gate cheat us with empty promises that never see fullfillment.Those that come through the Gate of Horn inform the dreamer of trut.

Keras 的優點:

  1. Keras 在設計時以人為本,強調快速建模,用戶可以快速地將所需模型的結構映射到 Keras 代碼中,盡可能減少編寫代碼的工作量。

  2. 支持現有的常見結構,比如 CNN、RNN 等。

  3. 高度模塊化,用戶幾乎能夠任意組合各種模塊來構造所需的模型:
    在 Keras 中,任何神經網絡模型都可以被描述為一個圖模型或者序列模型,其中的部件被划分為:
    - 神經網絡層
    - 損失函數
    - 激活函數
    - 初始化方法
    - 正則化方法
    - 優化引擎

  4. 基於 Python,用戶很容易實現模塊的自定義操作。

  5. 能在 CPU 和 GPU 之間無縫切換。


1 Keras 中的模型

關於Keras模型

Keras 有兩種類型的模型,序列模型(Sequential)和 函數式模型(Model),函數式模型應用更為廣泛,序列模型是函數式模型的一種特殊情況。函數式模型也叫通用模型。

兩類模型均有有兩個主要的方法:

  • model.summary():打印出模型概況,它實際調用的是keras.utils.print_summary
  • model.get_config():返回包含模型配置信息的 Python 字典。模型也可以從它的 config 信息中重構回去

對於 Model: Model.from_config 我不會使用。

對於 Sequential:

config = model.get_config()
model = Sequential.from_config(config)
  • model.get_layer():依據層名或下標獲得層對象
  • model.get_weights():返回模型權重張量的列表,類型為 numpy.array
  • model.set_weights():從 numpy.array 里將權重載入給模型,要求數組具有與 model.get_weights() 相同的形狀。

1.1 Sequential 序列模型

序列模型是函數式模型的簡略版(即序列模型是通用模型的一個子類),為最簡單的線性、從頭到尾的結構順序,不分叉。即這種模型各層之間是依次順序的線性關系,在第 \(k\) 層和 \(k+1\) 層之間可以加上各種元素來構造神經網絡。這些元素可以通過一個列表來制定,然后作為參數傳遞給序列模型來生成相應的模型。

Sequential模型的基本組件:

  1. model.add,添加層;
  2. model.compile,模型訓練的 BP 模式設置;
  3. model.fit,模型訓練參數設置 + 訓練;
  4. 模型評估
  5. 模型預測

1.1.1 add:添加層

序貫模型是多個網絡層的線性堆疊,也就是“一條路走到黑”。
可以通過向 Sequential 模型傳遞一個 layer 的 list 來構造該模型:

from keras.models import Sequential
from keras.layers import Dense, Activation

model = Sequential([Dense(32, input_shape=(784,)),
                    Activation('relu'),
                    Dense(10),
                    Activation('softmax'),
                   ])
Using TensorFlow backend.

也可以通過 .add() 方法一個個的將 layer 加入模型中:

model = Sequential()
model.add(Dense(32, input_shape=(784,)))
model.add(Activation('relu'))
model.add(Dense(10))
model.add(Activation('softmax'))

1.1.2 指定輸入數據的 shape

模型需要知道輸入數據的 shape,因此,Sequential 的第一層需要接受一個關於輸入數據 shape 的參數,后面的各個層則可以自動的推導出中間數據的shape,因此不需要為每個層都指定這個參數。有幾種方法來為第一層指定輸入數據的 shape:

  • 傳遞一個input_shape的關鍵字參數給第一層,input_shape是一個 tuple 類型的數據,其中也可以填入 None ,如果填入 None 則表示此位置可能是任何正整數。數據的 batch 大小不應包含在其中。
  • 有些2D層,如 Dense,支持通過指定其輸入維度 input_dim 來隱含的指定輸入數據 shape,是一個 Int 類型的數據。一些 3D 的時域層支持通過參數input_diminput_length 來指定輸入 shape。
  • 如果你需要為輸入指定一個固定大小的 batch_size(常用於 stateful RNN 網絡),可以傳遞 batch_size 參數到一個層中,例如你想指定輸入張量的batch 大小是 \(32\),數據shape是 \((6,8)\),則你需要傳遞 batch_size=32input_shape=(6,8)
model = Sequential()
model.add(Dense(32, input_dim= 784))
model.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_6 (Dense)              (None, 32)                25120     
=================================================================
Total params: 25,120
Trainable params: 25,120
Non-trainable params: 0
_________________________________________________________________
model = Sequential()
model.add(Dense(32, input_shape=(784,)))
model.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_8 (Dense)              (None, 32)                25120     
=================================================================
Total params: 25,120
Trainable params: 25,120
Non-trainable params: 0
_________________________________________________________________
model = Sequential()
model.add(Dense(100, input_shape= (32, 32, 3)))
model.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_9 (Dense)              (None, 32, 32, 100)       400       
=================================================================
Total params: 400
Trainable params: 400
Non-trainable params: 0
_________________________________________________________________

Param 是 \(400\)\(3 \times 100 + 100\) (包含偏置項)

model = Sequential()
model.add(Dense(100, input_shape= (32, 32, 3), batch_size= 64))
model.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_10 (Dense)             (64, 32, 32, 100)         400       
=================================================================
Total params: 400
Trainable params: 400
Non-trainable params: 0
_________________________________________________________________

1.1.3 編譯

在訓練模型之前,我們需要通過 compile 來對學習過程進行配置。
compile接收三個參數:

  • 優化器 optimizer:該參數可指定為已預定義的優化器名,如 rmsprop、adagrad ,或一個Optimizer 類的對象,詳情見優化器optimizers

  • 損失函數 loss:該參數為模型試圖最小化的目標函數,它可為預定義的損失函數名,如 categorical_crossentropy、mse,也可以為一個自定義損失函數。詳情見損失函數loss

  • 指標列表 metrics:對分類問題,我們一般將該列表設置為 metrics=['accuracy']。指標可以是一個預定義指標的名字,也可以是一個用戶定制的函數。指標函數應該返回單個張量,或一個完成 metric_name - > metric_value映射的字典。

  • sample_weight_mode:如果你需要按時間步為樣本賦權( 2D 權矩陣),將該值設為 “temporal”。
    默認為 “None”,代表按樣本賦權(1D 權)。在下面 fit 函數的解釋中有相關的參考內容。

  • kwargs: 使用 TensorFlow 作為后端請忽略該參數,若使用 Theano 作為后端,kwargs 的值將會傳遞給 K.function

注意:

模型在使用前必須編譯,否則在調用 fitevaluate 時會拋出異常。

# For a multi-class classification problem
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# For a binary classification problem
model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy'])

# For a mean squared error regression problem
model.compile(optimizer='rmsprop',
              loss='mse')

# For custom metrics
import keras.backend as K

def mean_pred(y_true, y_pred):
    return K.mean(y_pred)

model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy', mean_pred])

1.1.3 訓練

Keras以 Numpy 數組作為輸入數據和標簽的數據類型。訓練模型一般使用 fit 函數:

fit(self, x, y, batch_size=32, epochs=10, verbose=1, callbacks=None, validation_split=0.0, validation_data=None, shuffle=True, class_weight=None, sample_weight=None, initial_epoch=0)

本函數將模型訓練 epochs 輪,其參數有:

  • x:輸入數據。如果模型只有一個輸入,那么 x 的類型是 numpy array,如果模型有多個輸入,那么 x 的類型應當為 list,list 的元素是對應於各個輸入的 numpy array

  • y:標簽,numpy array

  • batch_size:整數,指定進行梯度下降時每個 batch 包含的樣本數。訓練時一個 batch 的樣本會被計算一次梯度下降,使目標函數優化一步。

  • epochs:整數,訓練的輪數,每個 epoch 會把訓練集輪一遍。

  • verbose:日志顯示,0 為不在標准輸出流輸出日志信息,1 為輸出進度條記錄,2 為每個epoch輸出一行記錄

  • callbacks:list,其中的元素是 keras.callbacks.Callback 的對象。這個 list 中的回調函數將會在訓練過程中的適當時機被調用,參考回調函數

  • validation_split\(0 - 1\) 之間的浮點數,用來指定訓練集的一定比例數據作為驗證集。驗證集將不參與訓練,並在每個 epoch 結束后測試的模型的指標,如損失函數、精確度等。

    • 注意,validation_split 的划分在 shuffle 之前,因此如果你的數據本身是有序的,需要先手工打亂再指定validation_split,否則可能會出現驗證集樣本不均勻。
  • validation_data:形式為 (X, y) 的 tuple,是指定的驗證集。此參數將覆蓋 validation_spilt。

  • shuffle:布爾值或字符串,一般為布爾值,表示是否在訓練過程中隨機打亂輸入樣本的順序。若為字符串 “batch”,則是用來處理 HDF5 數據的特殊情況,它將在 batch 內部將數據打亂。

  • class_weight:字典,將不同的類別映射為不同的權值,該參數用來在訓練過程中調整損失函數(只能用於訓練)

  • sample_weight:權值的numpy array,用於在訓練時調整損失函數(僅用於訓練)。可以傳遞一個 1D 的與樣本等長的向量用於對樣本進行 \(1\)\(1\) 的加權,或者在面對時序數據時,傳遞一個的形式為 (samples,sequence_length) 的矩陣來為每個時間步上的樣本賦不同的權。這種情況下請確定在編譯模型時添加了 sample_weight_mode='temporal'

  • initial_epoch: 從該參數指定的 epoch 開始訓練,在繼續之前的訓練時有用。

fit函數返回一個 History 的對象,其 History.history 屬性記錄了損失函數和其他指標的數值隨 epoch 變化的情況,如果有驗證集的話,也包含了驗證集的這些指標變化情況

注意:
要與之后的 fit_generator 做區別,兩者輸入 x/y 不同。

案例一:簡單的2分類

\(epoch = batch\_size \times iteration\) ,\(10\) 次 epoch 代表訓練十次訓練集

from keras.models import Sequential
from keras.layers import Dense, Activation

# 模型搭建階段
model= Sequential()  # 代表類的初始化

# Dense(32) is a fully-connected layer with 32 hidden units.
model.add(Dense(32, activation='relu', input_dim= 100))

model.add(Dense(1, activation='sigmoid'))

# For custom metrics
import keras.backend as K

def mean_pred(y_true, y_pred):
    return K.mean(y_pred)

model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy', mean_pred])

# Generate dummy data
import numpy as np
data = np.random.random((1000, 100))
labels = np.random.randint(2, size=(1000, 1))

# Train the model, iterating on the data in batches of 32 samples
model.fit(data, labels, epochs =10, batch_size=32)
Using TensorFlow backend.


Epoch 1/10
1000/1000 [==============================] - 3s - loss: 0.7218 - acc: 0.4780 - mean_pred: 0.5181     
Epoch 2/10
1000/1000 [==============================] - 0s - loss: 0.7083 - acc: 0.4990 - mean_pred: 0.5042     
Epoch 3/10
1000/1000 [==============================] - 0s - loss: 0.7053 - acc: 0.4850 - mean_pred: 0.5174     
Epoch 4/10
1000/1000 [==============================] - 0s - loss: 0.6978 - acc: 0.5400 - mean_pred: 0.5074     
Epoch 5/10
1000/1000 [==============================] - 0s - loss: 0.6938 - acc: 0.5250 - mean_pred: 0.5088     
Epoch 6/10
1000/1000 [==============================] - 0s - loss: 0.6887 - acc: 0.5290 - mean_pred: 0.5196     
Epoch 7/10
1000/1000 [==============================] - 0s - loss: 0.6847 - acc: 0.5570 - mean_pred: 0.5052     
Epoch 8/10
1000/1000 [==============================] - 0s - loss: 0.6797 - acc: 0.5530 - mean_pred: 0.5134     
Epoch 9/10
1000/1000 [==============================] - 0s - loss: 0.6749 - acc: 0.5790 - mean_pred: 0.5126     
Epoch 10/10
1000/1000 [==============================] - 0s - loss: 0.6728 - acc: 0.5920 - mean_pred: 0.5118     





<keras.callbacks.History at 0x1eafe9b9240>

1.1.4 evaluate 模型評估

evaluate(self, x, y, batch_size=32, verbose=1, sample_weight=None)

本函數按 batch 計算在某些輸入數據上模型的誤差,其參數有:

  • x:輸入數據,與 fit 一樣,是 numpy array 或 numpy array 的 list

  • y:標簽,numpy array

  • batch_size:整數,含義同 fit 的同名參數

  • verbose:含義同fit的同名參數,但只能取0或1

  • sample_weight:numpy array,含義同 fit 的同名參數

本函數返回一個測試誤差的標量值(如果模型沒有其他評價指標),或一個標量的 list(如果模型還有其他的評價指標)。model.metrics_names將給出 list 中各個值的含義。

model.evaluate(data, labels, batch_size=32)
 512/1000 [==============>...............] - ETA: 0s




[0.62733754062652591, 0.68200000000000005, 0.54467054557800298]
model.metrics_names
['loss', 'acc', 'mean_pred']

1.1.5 predict 模型預測

predict(self, x, batch_size=32, verbose=0)
predict_classes(self, x, batch_size=32, verbose=1)
predict_proba(self, x, batch_size=32, verbose=1)
  • predict 函數按 batch 獲得輸入數據對應的輸出,函數的返回值是預測值的 numpy array 其參數有:
  • predict_classes:本函數按batch產生輸入數據的類別預測結果;
  • predict_proba:本函數按 batch 產生輸入數據屬於各個類別的概率
model.predict_proba?
model.predict(data[:5])
array([[ 0.39388809],
       [ 0.39062682],
       [ 0.59655035],
       [ 0.53066045],
       [ 0.56720185]], dtype=float32)
model.predict_classes(data[:5])
5/5 [==============================] - 0s





array([[0],
       [0],
       [1],
       [1],
       [1]])
model.predict_proba(data[:5])
5/5 [==============================] - 0s





array([[ 0.39388809],
       [ 0.39062682],
       [ 0.59655035],
       [ 0.53066045],
       [ 0.56720185]], dtype=float32)

1.1.6 on_batch 的結果,模型檢查

  • train_on_batch:本函數在一個 batch 的數據上進行一次參數更新,函數返回訓練誤差的標量值或標量值的 list,與 evaluate 的情形相同。
  • test_on_batch:本函數在一個 batch 的樣本上對模型進行評估,函數的返回與 evaluate 的情形相同
  • predict_on_batch:本函數在一個 batch 的樣本上對模型進行測試,函數返回模型在一個 batch 上的預測結果
model.train_on_batch(data, labels)
[0.62733746, 0.68199992, 0.54467058]
model.train_on_batch(data, labels)
[0.62483531, 0.68799996, 0.52803379]

1.1.7 fit_generator

  • 利用 Python 的生成器,逐個生成數據的 batch 並進行訓練。
  • 生成器與模型將並行執行以提高效率。
  • 例如,該函數允許我們在 CPU 上進行實時的數據提升,同時在 GPU 上進行模型訓練
    參考鏈接:http://keras-cn.readthedocs.io/en/latest/models/sequential/

有了該函數,圖像分類訓練任務變得很簡單。

model.fit_generator(generator, steps_per_epoch, epochs=1, verbose=1, callbacks=None, validation_data=None, validation_steps=None, class_weight=None, max_queue_size=10, workers=1, use_multiprocessing=False, initial_epoch=0)

函數的參數是:

  • generator:生成器函數,生成器的輸出應該為:
    • 一個形如 (inputs,targets) 的tuple
    • 一個形如 (inputs, targets,sample_weight) 的 tuple。
      所有的返回值都應該包含相同數目的樣本。生成器將無限在數據集上循環。每個 epoch 以經過模型的樣本數達到 samples_per_epoch 時,記一個 epoch 結束。
  • steps_per_epoch:整數,當生成器返回 steps_per_epoch 次數據時計一個 epoch 結束,執行下一個 epoch
  • epochs:整數,數據迭代的輪數
  • verbose:日志顯示,0 為不在標准輸出流輸出日志信息,1 為輸出進度條記錄,2 為每個 epoch 輸出一行記錄
  • validation_data:具有以下三種形式之一
    • 生成驗證集的生成器
    • 一個形如 (inputs,targets) 的tuple
    • 一個形如 (inputs,targets,sample_weights) 的tuple
  • validation_steps: 當 validation_data 為生成器時,本參數指定驗證集的生成器返回次數
  • class_weight:規定類別權重的字典,將類別映射為權重,常用於處理樣本不均衡問題。
  • sample_weight:權值的 numpy array,用於在訓練時調整損失函數(僅用於訓練)。可以傳遞一個1D的與樣本等長的向量用於對樣本進行 \(1\)\(1\) 的加權,或者在面對時序數據時,傳遞一個的形式為 (samples,sequence_length) 的矩陣來為每個時間步上的樣本賦不同的權。這種情況下請確定在編譯模型時添加了sample_weight_mode='temporal'
  • workers:最大進程數
  • max_q_size:生成器隊列的最大容量
  • pickle_safe: 若為真,則使用基於進程的線程。由於該實現依賴多進程,不能傳遞 non picklable(無法被 pickle 序列化)的參數到生成器中,因為無法輕易將它們傳入子進程中。
  • initial_epoch: 從該參數指定的 epoch 開始訓練,在繼續之前的訓練時有用。
    函數返回一個 History 對象。

例子

def generate_arrays_from_file(path):
        while 1:
            f = open(path)
            for line in f:
                # create Numpy arrays of input data
                # and labels, from each line in the file
                x, y = process_line(line)
                yield (x, y)
            f.close()

model.fit_generator(generate_arrays_from_file('/my_file.txt'), steps_per_epoch= 1000, epochs=10)

1.1.8 其他的兩個輔助的內容:

  • evaluate_generator:本函數使用一個生成器作為數據源評估模型,生成器應返回與 test_on_batch 的輸入數據相同類型的數據。該函數的參數與 fit_generator 同名參數含義相同,steps 是生成器要返回數據的輪數。
  • predcit_generator:本函數使用一個生成器作為數據源預測模型,生成器應返回與 test_on_batch 的輸入數據相同類型的數據。該函數的參數與 fit_generator 同名參數含義相同,steps 是生成器要返回數據的輪數。

案例二:多分類-VGG的卷積神經網絡

注意:keras.utils.to_categorical 的用法:

類似於 One-Hot 編碼:

keras.utils.to_categorical(y, num_classes=None)
# -*- coding:utf-8 -*-

import numpy as np
import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras.optimizers import SGD
from keras.utils import np_utils

# Generate dummy data
x_train = np.random.random((100, 100, 100, 3))
# 100張圖片,每張 100*100*3
y_train = keras.utils.to_categorical(np.random.randint(10, size=(100, 1)), num_classes=10)
# 100*10
x_test = np.random.random((20, 100, 100, 3))
y_test = keras.utils.to_categorical(np.random.randint(10, size=(20, 1)), num_classes=10)
# 20*100

model = Sequential()#最簡單的線性、從頭到尾的結構順序,不分叉
# input: 100x100 images with 3 channels -> (100, 100, 3) tensors.
# this applies 32 convolution filters of size 3x3 each.
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(100, 100, 3)))
model.add(Conv2D(32, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))

sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd)

model.fit(x_train, y_train, batch_size=32, epochs=10)
score = model.evaluate(x_test, y_test, batch_size=32)
score
Epoch 1/10
100/100 [==============================] - 1s - loss: 2.3800     
Epoch 2/10
100/100 [==============================] - 0s - loss: 2.3484     
Epoch 3/10
100/100 [==============================] - 0s - loss: 2.3034     
Epoch 4/10
100/100 [==============================] - 0s - loss: 2.2938     
Epoch 5/10
100/100 [==============================] - 0s - loss: 2.2874     
Epoch 6/10
100/100 [==============================] - 0s - loss: 2.2873     
Epoch 7/10
100/100 [==============================] - 0s - loss: 2.3132     - ETA: 0s - loss: 2.31
Epoch 8/10
100/100 [==============================] - 0s - loss: 2.2866     
Epoch 9/10
100/100 [==============================] - 0s - loss: 2.2814     
Epoch 10/10
100/100 [==============================] - 0s - loss: 2.2856     
20/20 [==============================] - 0s





2.2700035572052002

使用LSTM的序列分類

采用stateful LSTM的相同模型

stateful LSTM的特點是,在處理過一個batch的訓練數據后,其內部狀態(記憶)會被作為下一個batch的訓練數據的初始狀態。狀態LSTM使得我們可以在合理的計算復雜度內處理較長序列

from keras.models import Sequential
from keras.layers import LSTM, Dense
import numpy as np

data_dim = 16
timesteps = 8
num_classes = 10
batch_size = 32

# Expected input batch shape: (batch_size, timesteps, data_dim)
# Note that we have to provide the full batch_input_shape since the network is stateful.
# the sample of index i in batch k is the follow-up for the sample i in batch k-1.
model = Sequential()
model.add(LSTM(32, return_sequences=True, stateful=True,
               batch_input_shape=(batch_size, timesteps, data_dim)))
model.add(LSTM(32, return_sequences=True, stateful=True))
model.add(LSTM(32, stateful=True))
model.add(Dense(10, activation='softmax'))

model.compile(loss='categorical_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])

# Generate dummy training data
x_train = np.random.random((batch_size * 10, timesteps, data_dim))
y_train = np.random.random((batch_size * 10, num_classes))

# Generate dummy validation data
x_val = np.random.random((batch_size * 3, timesteps, data_dim))
y_val = np.random.random((batch_size * 3, num_classes))

model.fit(x_train, y_train,
          batch_size=batch_size, epochs=5, shuffle=False,
          validation_data=(x_val, y_val))
Train on 320 samples, validate on 96 samples
Epoch 1/5
320/320 [==============================] - 2s - loss: 11.4843 - acc: 0.1062 - val_loss: 11.2222 - val_acc: 0.1042
Epoch 2/5
320/320 [==============================] - 0s - loss: 11.4815 - acc: 0.1031 - val_loss: 11.2207 - val_acc: 0.1250
Epoch 3/5
320/320 [==============================] - 0s - loss: 11.4799 - acc: 0.0844 - val_loss: 11.2202 - val_acc: 0.1562
Epoch 4/5
320/320 [==============================] - 0s - loss: 11.4790 - acc: 0.1000 - val_loss: 11.2198 - val_acc: 0.1562
Epoch 5/5
320/320 [==============================] - 0s - loss: 11.4780 - acc: 0.1094 - val_loss: 11.2194 - val_acc: 0.1250





<keras.callbacks.History at 0x1ab0e78ff28>

Keras FAQ:

常見問題: http://keras-cn.readthedocs.io/en/latest/for_beginners/FAQ/


1.2 Model(通用模型)(或者稱為函數式(Functional)模型)

函數式模型稱作 Functional,但它的類名是 Model,因此我們有時候也用 Model 來代表函數式模型。

Keras函數式模型接口是用戶定義多輸出模型、非循環有向模型或具有共享層的模型等復雜模型的途徑。函數式模型是最廣泛的一類模型,序貫模型(Sequential)只是它的一種特殊情況。更多關於序列模型的資料參考: 序貫模型API

通用模型可以用來設計非常復雜、任意拓撲結構的神經網絡。類似於序列模型,通用模型采用函數化的應用接口來定義模型。

在定義的時候,從輸入的多維矩陣開始,然后定義各層及其要素,最后定義輸出層。將輸入層與輸出層作為參數納入通用模型中就可以定義一個模型對象,並進行編譯和擬合。

函數式模型基本屬性與訓練流程:

  1. model.layers,添加層信息;
  2. model.compile,模型訓練的BP模式設置;
  3. model.fit,模型訓練參數設置 + 訓練;
  4. evaluate,模型評估;
  5. predict 模型預測

1.2.1 常用Model屬性

  • model.layers:組成模型圖的各個層
  • model.inputs:模型的輸入張量列表
  • model.outputs:模型的輸出張量列表

1.2.2 compile 訓練模式設置

compile(self, optimizer, loss, metrics=None, loss_weights=None, sample_weight_mode=None)

本函數編譯模型以供訓練,參數有

  • optimizer:優化器,為預定義優化器名或優化器對象
  • loss:損失函數,為預定義損失函數名或一個目標函數
  • metrics:列表,包含評估模型在訓練和測試時的性能的指標,典型用法是metrics=['accuracy']如果要在多輸出模型中為不同的輸出指定不同的指標,可像該參數傳遞一個字典,例如 metrics={'ouput_a': 'accuracy'}
  • sample_weight_mode:如果你需要按時間步為樣本賦權( 2D 權矩陣),將該值設為 “temporal”。默認為 “None”,代表按樣本賦權(1D權)。
    如果模型有多個輸出,可以向該參數傳入指定 sample_weight_mode 的字典或列表。在下面 fit 函數的解釋中有相關的參考內容。

【Tips】如果你只是載入模型並利用其 predict,可以不用進行 compile。在Keras中,compile 主要完成損失函數和優化器的一些配置,是為訓練服務的。predict 會在內部進行符號函數的編譯工作(通過調用_make_predict_function 生成函數)

1.2.3 fit 模型訓練參數設置 + 訓練

fit(self, x=None, y=None, batch_size=32, epochs=1, verbose=1, callbacks=None, validation_split=0.0, validation_data=None, shuffle=True, class_weight=None, sample_weight=None, initial_epoch=0)

與序列模型類似

1.2.4 evaluate,模型評估

evaluate(self, x, y, batch_size=32, verbose=1, sample_weight=None)

與序列模型類似

1.2.5 predict 模型預測

predict(self, x, batch_size=32, verbose=0)

與序列模型類似

1.2.6 模型檢查

  • train_on_batch:本函數在一個 batch 的數據上進行一次參數更新,函數返回訓練誤差的標量值或標量值的 list,與 evaluate 的情形相同。
  • test_on_batch:本函數在一個 batch 的樣本上對模型進行評估,函數的返回與 evaluate 的情形相同
  • predict_on_batch:本函數在一個 batch 的樣本上對模型進行測試,函數返回模型在一個 batch 上的預測結果

與序列模型類似

1.2.7 fit_generator

fit_generator(self, generator, steps_per_epoch, epochs=1, verbose=1, callbacks=None, validation_data=None, validation_steps=None, class_weight=None, max_q_size=10, workers=1, pickle_safe=False, initial_epoch=0) evaluate_generator(self, generator, steps, max_q_size=10, workers=1, pickle_safe=False)

案例三:全連接網絡

在開始前,有幾個概念需要澄清:

  • 層對象接受張量為參數,返回一個張量。
  • 輸入是張量,輸出也是張量的一個框架就是一個模型,通過 Model 定義。
  • 這樣的模型可以被像Keras的 Sequential 一樣被訓練
import keras
from keras.layers import Input, Dense
from keras.models import Model

# 層實例接受張量為參數,返回一個張量
inputs = Input(shape=(100,))

# a layer instance is callable on a tensor, and returns a tensor
# 輸入inputs,輸出x
# (inputs)代表輸入
x = Dense(64, activation='relu')(inputs)
# 輸入x,輸出x
x = Dense(64, activation='relu')(x)

predictions = Dense(100, activation='softmax')(x)
# 輸入x,輸出分類

# This creates a model that includes
# the Input layer and three Dense layers
model = Model(inputs=inputs, outputs=predictions)
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Generate dummy data
import numpy as np
data = np.random.random((1000, 100))
labels = keras.utils.to_categorical(np.random.randint(2, size=(1000, 1)), num_classes=100)

# Train the model
model.fit(data, labels, batch_size=64, epochs=10) # starts training
Epoch 1/10
1000/1000 [==============================] - 0s - loss: 2.2130 - acc: 0.4650        
Epoch 2/10
1000/1000 [==============================] - 0s - loss: 0.7474 - acc: 0.4980     
Epoch 3/10
1000/1000 [==============================] - 0s - loss: 0.7158 - acc: 0.5050     
Epoch 4/10
1000/1000 [==============================] - 0s - loss: 0.7039 - acc: 0.5260     
Epoch 5/10
1000/1000 [==============================] - 0s - loss: 0.7060 - acc: 0.5280     
Epoch 6/10
1000/1000 [==============================] - 0s - loss: 0.6979 - acc: 0.5270     
Epoch 7/10
1000/1000 [==============================] - 0s - loss: 0.6854 - acc: 0.5570     
Epoch 8/10
1000/1000 [==============================] - 0s - loss: 0.6920 - acc: 0.5300     
Epoch 9/10
1000/1000 [==============================] - 0s - loss: 0.6862 - acc: 0.5620     
Epoch 10/10
1000/1000 [==============================] - 0s - loss: 0.6766 - acc: 0.5750     





<keras.callbacks.History at 0x1ec3dd2d5c0>
inputs
<tf.Tensor 'input_4:0' shape=(?, 100) dtype=float32>

可以看到結構與序貫模型完全不一樣,其中 x = Dense(64, activation='relu')(inputs) 中:(input)代表輸入;x 代表輸出
model = Model(inputs=inputs, outputs=predictions) 該句是函數式模型的經典,可以同時輸入兩個 input,然后輸出 output兩個。

下面的時間序列模型,我不懂。。。。。。。。。

案例四:視頻處理

現在用來做遷移學習;

  • 還可以通過 TimeDistributed 來進行實時預測;

  • input_sequences 代表序列輸入;model 代表已訓練的模型

x = Input(shape=(100,))
# This works, and returns the 10-way softmax we defined above.
y = model(x)
# model里面存着權重,然后輸入 x,輸出結果,用來作 fine-tuning

# 分類 -> 視頻、實時處理
from keras.layers import TimeDistributed

# Input tensor for sequences of 20 timesteps,
# each containing a 100-dimensional vector
input_sequences = Input(shape=(20, 100))
# 20個時間間隔,輸入 100 維度的數據

# This applies our previous model to every timestep in the input sequences.
# the output of the previous model was a 10-way softmax,
# so the output of the layer below will be a sequence of 20 vectors of size 10.
processed_sequences = TimeDistributed(model)(input_sequences)  # Model是已經訓練好的
processed_sequences
<tf.Tensor 'time_distributed_1/Reshape_1:0' shape=(?, 20, 100) dtype=float32>

案例五:雙輸入、雙模型輸出:LSTM 時序預測

本案例很好,可以了解到 Model 的精髓在於他的任意性,給編譯者很多的便利。

  • 輸入:
    • 新聞語料;新聞語料對應的時間
  • 輸出:
    • 新聞語料的預測模型;新聞語料+對應時間的預測模型
模型一:只針對新聞語料的 LSTM 模型
from keras.layers import Input, Embedding, LSTM, Dense
from keras.models import Model

# Headline input: meant to receive sequences of 100 integers, between 1 and 10000.
# Note that we can name any layer by passing it a "name" argument.
main_input = Input(shape=(100,), dtype='int32', name='main_input')
# 一個100詞的 BOW 序列

# This embedding layer will encode the input sequence
# into a sequence of dense 512-dimensional vectors.
x = Embedding(output_dim=512, input_dim=10000, input_length=100)(main_input)
# Embedding 層,把 100 維度再 encode 成 512 的句向量,10000 指的是詞典單詞總數


# A LSTM will transform the vector sequence into a single vector,
# containing information about the entire sequence
lstm_out = LSTM(32)(x)
# ? 32什么意思?????????????????????

#然后,我們插入一個額外的損失,使得即使在主損失很高的情況下,LSTM 和 Embedding 層也可以平滑的訓練。

auxiliary_output = Dense(1, activation='sigmoid', name='aux_output')(lstm_out)
#再然后,我們將LSTM與額外的輸入數據串聯起來組成輸入,送入模型中:
# 模型一:只針對以上的序列做的預測模型
組合模型:新聞語料+時序
# 模型二:組合模型
auxiliary_input = Input(shape=(5,), name='aux_input')  # 新加入的一個Input,5維度
x = keras.layers.concatenate([lstm_out, auxiliary_input])   # 組合起來,對應起來


# We stack a deep densely-connected network on top
# 組合模型的形式
x = Dense(64, activation='relu')(x)
x = Dense(64, activation='relu')(x)
x = Dense(64, activation='relu')(x)
# And finally we add the main logistic regression layer
main_output = Dense(1, activation='sigmoid', name='main_output')(x)


#最后,我們定義整個2輸入,2輸出的模型:
model = Model(inputs=[main_input, auxiliary_input], outputs=[main_output, auxiliary_output])
#模型定義完畢,下一步編譯模型。
#我們給額外的損失賦0.2的權重。我們可以通過關鍵字參數loss_weights或loss來為不同的輸出設置不同的損失函數或權值。
#這兩個參數均可為Python的列表或字典。這里我們給loss傳遞單個損失函數,這個損失函數會被應用於所有輸出上。

其中:Model(inputs=[main_input, auxiliary_input], outputs=[main_output, auxiliary_output]) 是核心,
Input 兩個內容,outputs 兩個模型:

# 訓練方式一:兩個模型一個loss
model.compile(optimizer='rmsprop', loss='binary_crossentropy',
              loss_weights=[1., 0.2])
#編譯完成后,我們通過傳遞訓練數據和目標值訓練該模型:

model.fit([headline_data, additional_data], [labels, labels],
          epochs=50, batch_size=32)

# 訓練方式二:兩個模型,兩個Loss
#因為我們輸入和輸出是被命名過的(在定義時傳遞了“name”參數),我們也可以用下面的方式編譯和訓練模型:
model.compile(optimizer='rmsprop',
              loss={'main_output': 'binary_crossentropy', 'aux_output': 'binary_crossentropy'},
              loss_weights={'main_output': 1., 'aux_output': 0.2})

# And trained it via:
model.fit({'main_input': headline_data, 'aux_input': additional_data},
          {'main_output': labels, 'aux_output': labels},
          epochs=50, batch_size=32)

因為輸入兩個,輸出兩個模型,所以可以分為設置不同的模型訓練參數

案例六:共享層:對應關系、相似性

一個節點,分成兩個分支出去

import keras
from keras.layers import Input, LSTM, Dense
from keras.models import Model

tweet_a = Input(shape=(140, 256))
tweet_b = Input(shape=(140, 256))
#若要對不同的輸入共享同一層,就初始化該層一次,然后多次調用它
# 140個單詞,每個單詞256維度,詞向量
# 

# This layer can take as input a matrix
# and will return a vector of size 64
shared_lstm = LSTM(64)
# 返回一個64規模的向量

# When we reuse the same layer instance
# multiple times, the weights of the layer
# are also being reused
# (it is effectively *the same* layer)
encoded_a = shared_lstm(tweet_a)
encoded_b = shared_lstm(tweet_b)

# We can then concatenate the two vectors:
    # 連接兩個結果
    # axis=-1?????
merged_vector = keras.layers.concatenate([encoded_a, encoded_b], axis=-1)

# And add a logistic regression on top
predictions = Dense(1, activation='sigmoid')(merged_vector)
# 其中的1 代表什么????

# We define a trainable model linking the
# tweet inputs to the predictions
model = Model(inputs=[tweet_a, tweet_b], outputs=predictions)

model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy'])
model.fit([data_a, data_b], labels, epochs=10)
# 訓練模型,然后預測

案例七:抽取層節點內容

# 1、單節點
a = Input(shape=(140, 256))
lstm = LSTM(32)
encoded_a = lstm(a)
assert lstm.output == encoded_a
# 抽取獲得encoded_a的輸出張量

# 2、多節點
a = Input(shape=(140, 256))
b = Input(shape=(140, 256))

lstm = LSTM(32)
encoded_a = lstm(a)
encoded_b = lstm(b)

assert lstm.get_output_at(0) == encoded_a
assert lstm.get_output_at(1) == encoded_b

# 3、圖像層節點
# 對於input_shape和output_shape也是一樣,如果一個層只有一個節點,
#或所有的節點都有相同的輸入或輸出shape,
#那么input_shape和output_shape都是沒有歧義的,並也只返回一個值。
#但是,例如你把一個相同的Conv2D應用於一個大小為(3,32,32)的數據,
#然后又將其應用於一個(3,64,64)的數據,那么此時該層就具有了多個輸入和輸出的shape,
#你就需要顯式的指定節點的下標,來表明你想取的是哪個了
a = Input(shape=(3, 32, 32))
b = Input(shape=(3, 64, 64))

conv = Conv2D(16, (3, 3), padding='same')
conved_a = conv(a)

# Only one input so far, the following will work:
assert conv.input_shape == (None, 3, 32, 32)

conved_b = conv(b)
# now the `.input_shape` property wouldn't work, but this does:
assert conv.get_input_shape_at(0) == (None, 3, 32, 32)
assert conv.get_input_shape_at(1) == (None, 3, 64, 64)

案例八:視覺問答模型

#這個模型將自然語言的問題和圖片分別映射為特征向量,
#將二者合並后訓練一個logistic回歸層,從一系列可能的回答中挑選一個。
from keras.layers import Conv2D, MaxPooling2D, Flatten
from keras.layers import Input, LSTM, Embedding, Dense
from keras.models import Model, Sequential

# First, let's define a vision model using a Sequential model.
# This model will encode an image into a vector.
vision_model = Sequential()
vision_model.add(Conv2D(64, (3, 3) activation='relu', padding='same', input_shape=(3, 224, 224)))
vision_model.add(Conv2D(64, (3, 3), activation='relu'))
vision_model.add(MaxPooling2D((2, 2)))
vision_model.add(Conv2D(128, (3, 3), activation='relu', padding='same'))
vision_model.add(Conv2D(128, (3, 3), activation='relu'))
vision_model.add(MaxPooling2D((2, 2)))
vision_model.add(Conv2D(256, (3, 3), activation='relu', padding='same'))
vision_model.add(Conv2D(256, (3, 3), activation='relu'))
vision_model.add(Conv2D(256, (3, 3), activation='relu'))
vision_model.add(MaxPooling2D((2, 2)))
vision_model.add(Flatten())

# Now let's get a tensor with the output of our vision model:
image_input = Input(shape=(3, 224, 224))
encoded_image = vision_model(image_input)

# Next, let's define a language model to encode the question into a vector.
# Each question will be at most 100 word long,
# and we will index words as integers from 1 to 9999.
question_input = Input(shape=(100,), dtype='int32')
embedded_question = Embedding(input_dim=10000, output_dim=256, input_length=100)(question_input)
encoded_question = LSTM(256)(embedded_question)

# Let's concatenate the question vector and the image vector:
merged = keras.layers.concatenate([encoded_question, encoded_image])

# And let's train a logistic regression over 1000 words on top:
output = Dense(1000, activation='softmax')(merged)

# This is our final model:
vqa_model = Model(inputs=[image_input, question_input], outputs=output)

# The next stage would be training this model on actual data.

延伸一:fine-tuning 時如何加載 No_top 的權重

如果你需要加載權重到不同的網絡結構(有些層一樣)中,例如 fine-tunetransfer-learning,你可以通過層名字來加載模型:
model.load_weights(‘my_model_weights.h5’, by_name=True)

例如:
假如原模型為:

    model = Sequential()
    model.add(Dense(2, input_dim=3, name="dense_1"))
    model.add(Dense(3, name="dense_2"))
    ...
    model.save_weights(fname)

新模型為:

model = Sequential()
model.add(Dense(2, input_dim=3, name="dense_1"))  # will be loaded
model.add(Dense(10, name="new_dense"))  # will not be loaded

# load weights from first model; will only affect the first layer, dense_1.
model.load_weights(fname, by_name=True)

2 學習資料



3 keras 學習小結

引自:http://blog.csdn.net/sinat_26917383/article/details/72857454

3.1 keras網絡結構

20170604101316305.jpg-330.5kB

3.2 keras網絡配置

20170604101328219.jpg-214.2kB
其中回調函數 callbacks 是keras

3.3 keras預處理功能

20170604101335306.jpg-49.7kB

3.4 模型的節點信息提取

對於序列模型

%%time
import keras
from keras.models import Sequential 
from keras.layers import Dense 
import numpy as np

# 實現 Lenet
import keras
from keras.datasets import mnist 
(x_train, y_train), (x_test,y_test) = mnist.load_data()

x_train=x_train.reshape(-1, 28,28,1)
x_test=x_test.reshape(-1, 28,28,1)
x_train=x_train/255.
x_test=x_test/255.
y_train=keras.utils.to_categorical(y_train)
y_test=keras.utils.to_categorical(y_test)

from keras.layers import Conv2D, MaxPool2D, Dense, Flatten
from keras.models import Sequential 
lenet=Sequential()
lenet.add(Conv2D(6, kernel_size=3,strides=1, padding='same', input_shape=(28, 28, 1)))
lenet.add(MaxPool2D(pool_size=2,strides=2))
lenet.add(Conv2D(16, kernel_size=5, strides=1, padding='valid'))
lenet.add(MaxPool2D(pool_size=2, strides=2))
lenet.add(Flatten())
lenet.add(Dense(120))
lenet.add(Dense(84))
lenet.add(Dense(10, activation='softmax'))

lenet.compile('sgd',loss='categorical_crossentropy',metrics=['accuracy'])    # 編譯模型

lenet.fit(x_train,y_train,batch_size=64,epochs= 20,validation_data=[x_test,y_test], verbose= 0)  # 訓練模型

lenet.save('E:/Graphs/Models/myletnet.h5')  # 保存模型
Wall time: 2min 48s
# 節點信息提取
config = lenet.get_config()  # 把 lenet 模型中的信息提取出來
config[0]
{'class_name': 'Conv2D',
 'config': {'activation': 'linear',
  'activity_regularizer': None,
  'batch_input_shape': (None, 28, 28, 1),
  'bias_constraint': None,
  'bias_initializer': {'class_name': 'Zeros', 'config': {}},
  'bias_regularizer': None,
  'data_format': 'channels_last',
  'dilation_rate': (1, 1),
  'dtype': 'float32',
  'filters': 6,
  'kernel_constraint': None,
  'kernel_initializer': {'class_name': 'VarianceScaling',
   'config': {'distribution': 'uniform',
    'mode': 'fan_avg',
    'scale': 1.0,
    'seed': None}},
  'kernel_regularizer': None,
  'kernel_size': (3, 3),
  'name': 'conv2d_7',
  'padding': 'same',
  'strides': (1, 1),
  'trainable': True,
  'use_bias': True}}
model = Sequential.from_config(config)   # 將提取的信息傳給新的模型, 重構一個新的 Model 模型,fine-tuning 比較好用

3.5 模型概況查詢、保存及載入

1、模型概括打印

model.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_7 (Conv2D)            (None, 28, 28, 6)         60        
_________________________________________________________________
max_pooling2d_7 (MaxPooling2 (None, 14, 14, 6)         0         
_________________________________________________________________
conv2d_8 (Conv2D)            (None, 10, 10, 16)        2416      
_________________________________________________________________
max_pooling2d_8 (MaxPooling2 (None, 5, 5, 16)          0         
_________________________________________________________________
flatten_4 (Flatten)          (None, 400)               0         
_________________________________________________________________
dense_34 (Dense)             (None, 120)               48120     
_________________________________________________________________
dense_35 (Dense)             (None, 84)                10164     
_________________________________________________________________
dense_36 (Dense)             (None, 10)                850       
=================================================================
Total params: 61,610
Trainable params: 61,610
Non-trainable params: 0
_________________________________________________________________

2、權重獲取

model.get_layer('conv2d_7' )      # 依據層名或下標獲得層對象
<keras.layers.convolutional.Conv2D at 0x1ed425bce10>
weights = model.get_weights()    #返回模型權重張量的列表,類型為 numpy array
model.set_weights(weights)    #從 numpy array 里將權重載入給模型,要求數組具有與 model.get_weights() 相同的形狀。
# 查看 model 中 Layer 的信息
model.layers 
[<keras.layers.convolutional.Conv2D at 0x1ed425bce10>,
 <keras.layers.pooling.MaxPooling2D at 0x1ed4267a4a8>,
 <keras.layers.convolutional.Conv2D at 0x1ed4267a898>,
 <keras.layers.pooling.MaxPooling2D at 0x1ed4266bb00>,
 <keras.layers.core.Flatten at 0x1ed4267ebe0>,
 <keras.layers.core.Dense at 0x1ed426774a8>,
 <keras.layers.core.Dense at 0x1ed42684940>,
 <keras.layers.core.Dense at 0x1ed4268edd8>]

3.6 模型保存與加載

引用:keras如何保存模型

  • 使用 model.save(filepath) 將 Keras 模型和權重保存在一個 HDF5 文件中,該文件將包含:

    • 模型的結構(以便重構該模型)
    • 模型的權重
    • 訓練配置(損失函數,優化器等)
    • 優化器的狀態(以便於從上次訓練中斷的地方開始)
  • 使用 keras.models.load_model(filepath) 來重新實例化你的模型,如果文件中存儲了訓練配置的話,該函數還會同時完成模型的編譯

# 將模型權重保存到指定路徑,文件類型是HDF5(后綴是.h5)
filepath = 'E:/Graphs/Models/lenet.h5'
model.save_weights(filepath)

# 從 HDF5 文件中加載權重到當前模型中, 默認情況下模型的結構將保持不變。
# 如果想將權重載入不同的模型(有些層相同)中,則設置 by_name=True,只有名字匹配的層才會載入權重
model.load_weights(filepath, by_name=False)
json_string = model.to_json()  # 等價於 json_string = model.get_config()  
open('E:/Graphs/Models/lenet.json','w').write(json_string)    
model.save_weights('E:/Graphs/Models/lenet_weights.h5')   

#加載模型數據和weights  
model = model_from_json(open('E:/Graphs/Models/lenet.json').read())    
model.load_weights('E:/Graphs/Models/lenet_weights.h5')     

3.6.1 只保存模型結構,而不包含其權重或配置信息

  • 保存成 json 格式的文件
# save as JSON
json_string = model.to_json()  
open('E:/Graphs/Models/my_model_architecture.json','w').write(json_string)   
from keras.models import model_from_json
model = model_from_json(open('E:/Graphs/Models/my_model_architecture.json').read())  
  • 保存成 yaml 文件
# save as YAML  
yaml_string = model.to_yaml()  
open('E:/Graphs/Models/my_model_architectrue.yaml','w').write(yaml_string)
from keras.models import model_from_yaml
model = model_from_yaml(open('E:/Graphs/Models/my_model_architectrue.yaml').read())

這些操作將把模型序列化為json或yaml文件,這些文件對人而言也是友好的,如果需要的話你甚至可以手動打開這些文件並進行編輯。當然,你也可以從保存好的json文件或yaml文件中載入模型

3.6.2 實時保存模型結構、訓練出來的權重、及優化器狀態並調用

keras 的 callback 參數可以幫助我們實現在訓練過程中的適當時機被調用。實現實時保存訓練模型以及訓練參數

keras.callbacks.ModelCheckpoint(
    filepath, 
    monitor='val_loss', 
    verbose=0, 
    save_best_only=False, 
    save_weights_only=False, 
    mode='auto', 
    period=1
)
  1. filename:字符串,保存模型的路徑
  2. monitor:需要監視的值
  3. verbose:信息展示模式,01
  4. save_best_only:當設置為True時,將只保存在驗證集上性能最好的模型
  5. mode:‘auto’,‘min’,‘max’之一,在save_best_only=True時決定性能最佳模型的評判准則,例如,當監測值為val_acc時,模式應為max,當檢測值為val_loss時,模式應為min。在auto模式下,評價准則由被監測值的名字自動推斷。
  6. save_weights_only:若設置為True,則只保存模型權重,否則將保存整個模型(包括模型結構,配置信息等)
  7. period:CheckPoint之間的間隔的epoch數

3.6.3 示例

假如原模型為:

x=np.array([[0,1,0],[0,0,1],[1,3,2],[3,2,1]])
y=np.array([0,0,1,1]).T

model=Sequential()
model.add(Dense(5,input_shape=(x.shape[1],),activation='relu', name='layer1'))
model.add(Dense(4,activation='relu',name='layer2'))
model.add(Dense(1,activation='sigmoid',name='layer3'))
model.compile(optimizer='sgd',loss='mean_squared_error')

model.fit(x,y,epochs=200, verbose= 0)   # 訓練
model.save_weights('E:/Graphs/Models/my_weights.h5')
model.predict(x[0:1])   # 預測
array([[ 0.38783705]], dtype=float32)
# 新模型
model = Sequential()
model.add(Dense(2, input_dim=3, name="layer_1"))  # will be loaded
model.add(Dense(10, name="new_dense"))  # will not be loaded

# load weights from first model; will only affect the first layer, dense_1.
model.load_weights('E:/Graphs/Models/my_weights.h5', by_name=True)
model.predict(x[1:2])
array([[-0.27631092, -0.35040742, -0.2807056 , -0.22762418, -0.31791407,
        -0.0897391 ,  0.02615392, -0.15040982,  0.19909057, -0.38647971]], dtype=float32)

3.7 How to Check-Point Deep Learning Models in Keras

# Checkpoint the weights when validation accuracy improves
from keras.models import Sequential
from keras.layers import Dense
from keras.callbacks import ModelCheckpoint
import matplotlib.pyplot as plt
import numpy as np

x=np.array([[0,1,0],[0,0,1],[1,3,2],[3,2,1]])
y=np.array([0,0,1,1]).T

model=Sequential()
model.add(Dense(5,input_shape=(x.shape[1],),activation='relu', name='layer1'))
model.add(Dense(4,activation='relu',name='layer2'))
model.add(Dense(1,activation='sigmoid',name='layer3'))
# Compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

filepath="E:/Graphs/Models/weights-improvement-{epoch:02d}-{val_acc:.2f}.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, save_best_only=True, mode='max')
callbacks_list = [checkpoint]
# Fit the model
model.fit(x, y, validation_split=0.33, epochs=150, batch_size=10, callbacks=callbacks_list, verbose=0)
Epoch 00000: val_acc improved from -inf to 1.00000, saving model to E:/Graphs/Models/weights-improvement-00-1.00.hdf5
Epoch 00001: val_acc did not improve
Epoch 00002: val_acc did not improve
Epoch 00003: val_acc did not improve
Epoch 00004: val_acc did not improve
Epoch 00005: val_acc did not improve
Epoch 00006: val_acc did not improve
Epoch 00007: val_acc did not improve
Epoch 00008: val_acc did not improve
Epoch 00009: val_acc did not improve
Epoch 00010: val_acc did not improve
Epoch 00011: val_acc did not improve
Epoch 00012: val_acc did not improve
Epoch 00013: val_acc did not improve
Epoch 00014: val_acc did not improve
Epoch 00015: val_acc did not improve
Epoch 00016: val_acc did not improve
Epoch 00017: val_acc did not improve
Epoch 00018: val_acc did not improve
Epoch 00019: val_acc did not improve
Epoch 00020: val_acc did not improve
Epoch 00021: val_acc did not improve
Epoch 00022: val_acc did not improve
Epoch 00023: val_acc did not improve
Epoch 00024: val_acc did not improve
Epoch 00025: val_acc did not improve
Epoch 00026: val_acc did not improve
Epoch 00027: val_acc did not improve
Epoch 00028: val_acc did not improve
Epoch 00029: val_acc did not improve
Epoch 00030: val_acc did not improve
Epoch 00031: val_acc did not improve
Epoch 00032: val_acc did not improve
Epoch 00033: val_acc did not improve
Epoch 00034: val_acc did not improve
Epoch 00035: val_acc did not improve
Epoch 00036: val_acc did not improve
Epoch 00037: val_acc did not improve
Epoch 00038: val_acc did not improve
Epoch 00039: val_acc did not improve
Epoch 00040: val_acc did not improve
Epoch 00041: val_acc did not improve
Epoch 00042: val_acc did not improve
Epoch 00043: val_acc did not improve
Epoch 00044: val_acc did not improve
Epoch 00045: val_acc did not improve
Epoch 00046: val_acc did not improve
Epoch 00047: val_acc did not improve
Epoch 00048: val_acc did not improve
Epoch 00049: val_acc did not improve
Epoch 00050: val_acc did not improve
Epoch 00051: val_acc did not improve
Epoch 00052: val_acc did not improve
Epoch 00053: val_acc did not improve
Epoch 00054: val_acc did not improve
Epoch 00055: val_acc did not improve
Epoch 00056: val_acc did not improve
Epoch 00057: val_acc did not improve
Epoch 00058: val_acc did not improve
Epoch 00059: val_acc did not improve
Epoch 00060: val_acc did not improve
Epoch 00061: val_acc did not improve
Epoch 00062: val_acc did not improve
Epoch 00063: val_acc did not improve
Epoch 00064: val_acc did not improve
Epoch 00065: val_acc did not improve
Epoch 00066: val_acc did not improve
Epoch 00067: val_acc did not improve
Epoch 00068: val_acc did not improve
Epoch 00069: val_acc did not improve
Epoch 00070: val_acc did not improve
Epoch 00071: val_acc did not improve
Epoch 00072: val_acc did not improve
Epoch 00073: val_acc did not improve
Epoch 00074: val_acc did not improve
Epoch 00075: val_acc did not improve
Epoch 00076: val_acc did not improve
Epoch 00077: val_acc did not improve
Epoch 00078: val_acc did not improve
Epoch 00079: val_acc did not improve
Epoch 00080: val_acc did not improve
Epoch 00081: val_acc did not improve
Epoch 00082: val_acc did not improve
Epoch 00083: val_acc did not improve
Epoch 00084: val_acc did not improve
Epoch 00085: val_acc did not improve
Epoch 00086: val_acc did not improve
Epoch 00087: val_acc did not improve
Epoch 00088: val_acc did not improve
Epoch 00089: val_acc did not improve
Epoch 00090: val_acc did not improve
Epoch 00091: val_acc did not improve
Epoch 00092: val_acc did not improve
Epoch 00093: val_acc did not improve
Epoch 00094: val_acc did not improve
Epoch 00095: val_acc did not improve
Epoch 00096: val_acc did not improve
Epoch 00097: val_acc did not improve
Epoch 00098: val_acc did not improve
Epoch 00099: val_acc did not improve
Epoch 00100: val_acc did not improve
Epoch 00101: val_acc did not improve
Epoch 00102: val_acc did not improve
Epoch 00103: val_acc did not improve
Epoch 00104: val_acc did not improve
Epoch 00105: val_acc did not improve
Epoch 00106: val_acc did not improve
Epoch 00107: val_acc did not improve
Epoch 00108: val_acc did not improve
Epoch 00109: val_acc did not improve
Epoch 00110: val_acc did not improve
Epoch 00111: val_acc did not improve
Epoch 00112: val_acc did not improve
Epoch 00113: val_acc did not improve
Epoch 00114: val_acc did not improve
Epoch 00115: val_acc did not improve
Epoch 00116: val_acc did not improve
Epoch 00117: val_acc did not improve
Epoch 00118: val_acc did not improve
Epoch 00119: val_acc did not improve
Epoch 00120: val_acc did not improve
Epoch 00121: val_acc did not improve
Epoch 00122: val_acc did not improve
Epoch 00123: val_acc did not improve
Epoch 00124: val_acc did not improve
Epoch 00125: val_acc did not improve
Epoch 00126: val_acc did not improve
Epoch 00127: val_acc did not improve
Epoch 00128: val_acc did not improve
Epoch 00129: val_acc did not improve
Epoch 00130: val_acc did not improve
Epoch 00131: val_acc did not improve
Epoch 00132: val_acc did not improve
Epoch 00133: val_acc did not improve
Epoch 00134: val_acc did not improve
Epoch 00135: val_acc did not improve
Epoch 00136: val_acc did not improve
Epoch 00137: val_acc did not improve
Epoch 00138: val_acc did not improve
Epoch 00139: val_acc did not improve
Epoch 00140: val_acc did not improve
Epoch 00141: val_acc did not improve
Epoch 00142: val_acc did not improve
Epoch 00143: val_acc did not improve
Epoch 00144: val_acc did not improve
Epoch 00145: val_acc did not improve
Epoch 00146: val_acc did not improve
Epoch 00147: val_acc did not improve
Epoch 00148: val_acc did not improve
Epoch 00149: val_acc did not improve





<keras.callbacks.History at 0x1ed46f00ac8>

3.8 Checkpoint Best Neural Network Model Only

# Checkpoint the weights for best model on validation accuracy
import keras
from keras.layers import Input, Dense
from keras.models import Model
from keras.callbacks import ModelCheckpoint
import matplotlib.pyplot as plt

# 層實例接受張量為參數,返回一個張量
inputs = Input(shape=(100,))

# a layer instance is callable on a tensor, and returns a tensor
# 輸入inputs,輸出x
# (inputs)代表輸入
x = Dense(64, activation='relu')(inputs)

x = Dense(64, activation='relu')(x)
# 輸入x,輸出x
predictions = Dense(100, activation='softmax')(x)
# 輸入x,輸出分類

# This creates a model that includes
# the Input layer and three Dense layers
model = Model(inputs=inputs, outputs=predictions)
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Generate dummy data
import numpy as np
data = np.random.random((1000, 100))
labels = keras.utils.to_categorical(np.random.randint(2, size=(1000, 1)), num_classes=100)

# checkpoint
filepath="E:/Graphs/Models/weights.best.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, save_best_only=True, mode='max')
callbacks_list = [checkpoint]
# Fit the model
model.fit(data, labels, validation_split=0.33, epochs=15, batch_size=10, callbacks=callbacks_list, verbose=0)
Epoch 00000: val_acc improved from -inf to 0.48036, saving model to E:/Graphs/Models/weights.best.hdf5
Epoch 00001: val_acc improved from 0.48036 to 0.51360, saving model to E:/Graphs/Models/weights.best.hdf5
Epoch 00002: val_acc did not improve
Epoch 00003: val_acc did not improve
Epoch 00004: val_acc improved from 0.51360 to 0.52568, saving model to E:/Graphs/Models/weights.best.hdf5
Epoch 00005: val_acc did not improve
Epoch 00006: val_acc improved from 0.52568 to 0.52568, saving model to E:/Graphs/Models/weights.best.hdf5
Epoch 00007: val_acc did not improve
Epoch 00008: val_acc did not improve
Epoch 00009: val_acc did not improve
Epoch 00010: val_acc did not improve
Epoch 00011: val_acc did not improve
Epoch 00012: val_acc did not improve
Epoch 00013: val_acc did not improve
Epoch 00014: val_acc did not improve





<keras.callbacks.History at 0x1a276ec1be0>

3.9 Loading a Check-Pointed Neural Network Model

# How to load and use weights from a checkpoint
from keras.models import Sequential
from keras.layers import Dense
from keras.callbacks import ModelCheckpoint
import matplotlib.pyplot as plt

# create model
model = Sequential()
model.add(Dense(64, input_dim=100, kernel_initializer='uniform', activation='relu'))
model.add(Dense(64, kernel_initializer='uniform', activation='relu'))
model.add(Dense(100, kernel_initializer='uniform', activation='sigmoid'))
# load weights
model.load_weights("E:/Graphs/Models/weights.best.hdf5")
# Compile model (required to make predictions)
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
print("Created model and loaded weights from file")

# Generate dummy data
import numpy as np
data = np.random.random((1000, 100))
labels = keras.utils.to_categorical(np.random.randint(2, size=(1000, 1)), num_classes=100)

# estimate accuracy on whole dataset using loaded weights
scores = model.evaluate(data, labels, verbose=0)
print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))
Created model and loaded weights from file
acc: 99.00%

3.10 如何在 keras 中設定 GPU 使用的大小

本節來源於:深度學習theano/tensorflow多顯卡多人使用問題集(參見:Limit the resource usage for tensorflow backend · Issue #1538 · fchollet/keras · GitHub

在使用 keras 時候會出現總是占滿 GPU 顯存的情況,可以通過重設 backend 的 GPU 占用情況來進行調節。

import tensorflow as tf
from keras.backend.tensorflow_backend import set_session
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.3
set_session(tf.Session(config=config))

需要注意的是,雖然代碼或配置層面設置了對顯存占用百分比閾值,但在實際運行中如果達到了這個閾值,程序有需要的話還是會突破這個閾值。換而言之如果跑在一個大數據集上還是會用到更多的顯存。以上的顯存限制僅僅為了在跑小數據集時避免對顯存的浪費而已。


Tips

更科學地訓練與保存模型

from keras.datasets import mnist
from keras.models import Model
from keras.layers import Dense, Activation, Flatten, Input
(x_train, y_train), (x_test, y_test) = mnist.load_data()
y_train = keras.utils.to_categorical(y_train, 10)
y_test = keras.utils.to_categorical(y_test, 10)
x_train.shape
(60000, 28, 28)
import keras
from keras.layers import Input, Dense
from keras.models import Model
from keras.callbacks import ModelCheckpoint

# 層實例接受張量為參數,返回一個張量
inputs = Input(shape=(28, 28))

x = Flatten()(inputs)
x = Dense(64, activation='relu')(x)
x = Dense(64, activation='relu')(x)

predictions = Dense(10, activation='softmax')(x)
# 輸入x,輸出分類

# This creates a model that includes
# the Input layer and three Dense layers
model = Model(inputs=inputs, outputs=predictions)
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

model.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_6 (InputLayer)         (None, 28, 28)            0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 784)               0         
_________________________________________________________________
dense_16 (Dense)             (None, 64)                50240     
_________________________________________________________________
dense_17 (Dense)             (None, 64)                4160      
_________________________________________________________________
dense_18 (Dense)             (None, 10)                650       
=================================================================
Total params: 55,050
Trainable params: 55,050
Non-trainable params: 0
_________________________________________________________________
filepath = 'E:/Graphs/Models/model-ep{epoch:03d}-loss{loss:.3f}-val_loss{val_loss:.3f}.h5'
checkpoint = ModelCheckpoint(filepath, monitor='val_loss', verbose=1, save_best_only=True, mode='min')
# fit model
model.fit(x_train, y_train, epochs=20, verbose=2, batch_size=64, callbacks=[checkpoint], validation_data=(x_test, y_test))
Train on 60000 samples, validate on 10000 samples
Epoch 1/20
Epoch 00000: val_loss improved from inf to 6.25477, saving model to E:/Graphs/Models/model-ep000-loss6.835-val_loss6.255.h5
10s - loss: 6.8349 - acc: 0.5660 - val_loss: 6.2548 - val_acc: 0.6063
Epoch 2/20
Epoch 00001: val_loss improved from 6.25477 to 5.75301, saving model to E:/Graphs/Models/model-ep001-loss5.981-val_loss5.753.h5
7s - loss: 5.9805 - acc: 0.6246 - val_loss: 5.7530 - val_acc: 0.6395
Epoch 3/20
Epoch 00002: val_loss did not improve
5s - loss: 5.8032 - acc: 0.6368 - val_loss: 5.9562 - val_acc: 0.6270
Epoch 4/20
Epoch 00003: val_loss improved from 5.75301 to 5.69140, saving model to E:/Graphs/Models/model-ep003-loss5.816-val_loss5.691.h5
7s - loss: 5.8163 - acc: 0.6363 - val_loss: 5.6914 - val_acc: 0.6451
Epoch 5/20
Epoch 00004: val_loss did not improve
6s - loss: 5.7578 - acc: 0.6404 - val_loss: 5.8904 - val_acc: 0.6317
Epoch 6/20
Epoch 00005: val_loss did not improve
7s - loss: 5.7435 - acc: 0.6417 - val_loss: 5.8636 - val_acc: 0.6342
Epoch 7/20
Epoch 00006: val_loss improved from 5.69140 to 5.68394, saving model to E:/Graphs/Models/model-ep006-loss5.674-val_loss5.684.h5
7s - loss: 5.6743 - acc: 0.6458 - val_loss: 5.6839 - val_acc: 0.6457
Epoch 8/20
Epoch 00007: val_loss improved from 5.68394 to 5.62847, saving model to E:/Graphs/Models/model-ep007-loss5.655-val_loss5.628.h5
6s - loss: 5.6552 - acc: 0.6472 - val_loss: 5.6285 - val_acc: 0.6488
Epoch 9/20
Epoch 00008: val_loss did not improve
6s - loss: 5.6277 - acc: 0.6493 - val_loss: 5.7295 - val_acc: 0.6422
Epoch 10/20
Epoch 00009: val_loss improved from 5.62847 to 5.55242, saving model to E:/Graphs/Models/model-ep009-loss5.577-val_loss5.552.h5
6s - loss: 5.5769 - acc: 0.6524 - val_loss: 5.5524 - val_acc: 0.6540
Epoch 11/20
Epoch 00010: val_loss improved from 5.55242 to 5.53212, saving model to E:/Graphs/Models/model-ep010-loss5.537-val_loss5.532.h5
6s - loss: 5.5374 - acc: 0.6550 - val_loss: 5.5321 - val_acc: 0.6560
Epoch 12/20
Epoch 00011: val_loss improved from 5.53212 to 5.53056, saving model to E:/Graphs/Models/model-ep011-loss5.549-val_loss5.531.h5
6s - loss: 5.5492 - acc: 0.6543 - val_loss: 5.5306 - val_acc: 0.6553
Epoch 13/20
Epoch 00012: val_loss improved from 5.53056 to 5.48013, saving model to E:/Graphs/Models/model-ep012-loss5.558-val_loss5.480.h5
7s - loss: 5.5579 - acc: 0.6538 - val_loss: 5.4801 - val_acc: 0.6587
Epoch 14/20
Epoch 00013: val_loss did not improve
6s - loss: 5.5490 - acc: 0.6547 - val_loss: 5.5233 - val_acc: 0.6561
Epoch 15/20
Epoch 00014: val_loss did not improve
7s - loss: 5.5563 - acc: 0.6541 - val_loss: 5.4960 - val_acc: 0.6580
Epoch 16/20
Epoch 00015: val_loss did not improve
6s - loss: 5.5364 - acc: 0.6554 - val_loss: 5.5200 - val_acc: 0.6567
Epoch 17/20
Epoch 00016: val_loss did not improve
6s - loss: 5.5081 - acc: 0.6571 - val_loss: 5.5577 - val_acc: 0.6544
Epoch 18/20
Epoch 00017: val_loss did not improve
6s - loss: 5.5281 - acc: 0.6560 - val_loss: 5.5768 - val_acc: 0.6530
Epoch 19/20
Epoch 00018: val_loss did not improve
6s - loss: 5.5146 - acc: 0.6567 - val_loss: 5.7057 - val_acc: 0.6447
Epoch 20/20
Epoch 00019: val_loss improved from 5.48013 to 5.46820, saving model to E:/Graphs/Models/model-ep019-loss5.476-val_loss5.468.h5
7s - loss: 5.4757 - acc: 0.6592 - val_loss: 5.4682 - val_acc: 0.6601





<keras.callbacks.History at 0x25b5ae27630>

如果 val_loss 提高了就會保存,沒有提高就不會保存。



免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM