孿生神經網絡(Siamese Network)對Fashion-MNIST數據的分類識別


項目介紹

這個項目是我在昆士蘭科技大學學習AI課程時的一次作業,由我和潘永瑞共同完成。

數據介紹

本項目使用的數據集是keras.datasets.fashion_mnist.load_data中的內置數據集,稱為Fashion-MNIST。數據集包含用類標記的圖像,這些類包括[“上衣”、“褲裝”、“套頭衫”、“外套”、“涼鞋”、“踝靴”、“連衣裙”、“運動鞋”、“包”、“襯衫”](["top", "trouser", "pullover", "coat", "sandal", "ankle boot", "dress", "sneaker", "bag", "shirt"])。圖片都是28x28大小的灰度圖像。

孿生神經網絡結構

一個孿生神經網絡由兩個相同的子網絡組成,兩個子網絡共享相同的權重,他們連接着一個距離計算層。下圖顯示了結構。

 

運行環境

這個項目使用python 3.7版本。keras是這個項目的核心部分。它涉及到網絡和分類器的構建。NumPy和Matplotlib.pyplot也被引用,它們分別用於數據集切片和圖形繪制。細節如下:

 1 import random
 2 import tensorflow as tf  
 3 from tensorflow import keras  
 4 from keras.layers import Input, Flatten, Dense, Dropout, Lambda, MaxPooling2D  
 5 from keras.models import Model  
 6 from keras.optimizers import RMSprop  
 7 from keras import backend as K  
 8 from keras.layers.convolutional import Conv2D  
 9 from keras.layers import LeakyReLU  
10 from keras.regularizers import l2  
11 from keras.models import Model, Sequential  
12 from tensorflow.keras import regularizers  
13 import numpy as np  
14 import matplotlib.pyplot as plt  

項目目標

使用類別為 ["top", "trouser", "pullover", "coat", "sandal", "ankle boot"] 的圖片訓練神經網絡

通過以下方法評估網絡的泛化能力:

1.用參與訓練的類別["top", "trouser", "pullover", "coat", "sandal", "ankle boot"]的測試集來評估網絡

2.用參與訓練的類別["top", "trouser", "pullover", "coat", "sandal", "ankle boot"] 以及未參與訓練的類別["dress", "sneaker", "bag", "shirt"]組成的測試集來評估網絡

3.用未參與訓練的類別["dress", "sneaker", "bag", "shirt"]的測試集來評估網絡

 

代碼講解

數據的載入和觀察

(x_train, y_train), (x_test, y_test) = keras.datasets.fashion_mnist.load_data()  

print('train_images : ', x_train.shape, x_train.dtype)  
print('train_labels : ', y_train.shape, y_train.dtype)  
print('test_images : ', x_test.shape, x_test.dtype)  
print('test_labels : ', y_test.shape, y_test.dtype)  

輸出如下

train_images :  (60000, 28, 28) uint8

train_labels :  (60000,) uint8

test_images :  (10000, 28, 28) uint8

test_labels :  (10000,) uint8

可以看到數據為我們准備好了60000張圖片組成的訓練集以及10000張圖片的測試集

來看看訓練集的第一張圖片長什么樣子

plt.figure()  
plt.imshow(x_train[0, :])  
plt.colorbar()  
plt.grid(False)  
plt.show()  

 

 

 可以看出這是一張踝靴的圖片

數據歸一化

我們對數據進行歸一化處理,把訓練集和測試集除以255,把值的范圍縮小到0-1。然后我們可以取值為“1”的像素作為黑色,值為“0”的像素作為白色。

x_train = x_train.astype('float32')  
x_test = x_test.astype('float32')  
x_train = x_train / 255.0  
x_test = x_test / 255.0  

數據切片

我們把數據根據類別切片,為之后做准備。

# split labels ["top", "trouser", "pullover", "coat", "sandal", "ankle boot"] to train set 
digit_indices = [np.where(y_train == i)[0] for i in {0,1,2,4,5,9}] digit_indices = np.array(digit_indices) # length of each column n = min([len(digit_indices[d]) for d in range(6)]) # Keep 80% of the images with labels ["top", "trouser", "pullover", "coat", "sandal", "ankleboot"] for training (and 20% for testing) train_set_shape = n * 0.8 test_set_shape = n * 0.2 y_train_new = digit_indices[:, :int(train_set_shape)] y_test_new = digit_indices[:, int(train_set_shape):] # Keep 100% of the images with labels in ["dress", "sneaker", "bag", "shirt"] for testing digit_indices_t = [np.where(y_train == i)[0] for i in {3,6,7,8}] y_test_new_2 = np.array(digit_indices_t) print(y_train_new.shape) print(y_test_new.shape) print(y_test_new_2.shape)

(6, 4800)

(6, 1200)

(4, 6000)

創造圖片對

為了建立一個能夠識別兩個圖像是否屬於同一類的分類器,我們需要在整個數據集中創建一對又一對的圖像。

我們所做的方法是:1)對於屬於每個給定類的每個圖像,我們選擇它旁邊的圖像並形成一對。例如,在“top”類中,第一圖像和第二圖像將形成一對,第二圖像將與第三圖像形成一對……這些對將是正對(positive pairs)。2) 同時,我們選擇一個屬於另一個類的圖像並形成一對。例如,“top”類中的第一個圖像將與“pullover”類中的第一個圖像形成一對。這些對將是負對(negative pairs)。3) 我們將正負對的每個組合的標簽指定為[1,0]。

def create_pairs(self, x, digit_indices):
        '''
        Positive and negative pair creation.
        Alternates between positive and negative pairs.
        '''
        pairs = []
        # labels are 1 or 0 identify whether the pair is positive or negative
        labels = []

        class_num = digit_indices.shape[0]
        for d in range(class_num):
            for i in range(int(digit_indices.shape[1]) - 1):
                # use images from the same class to create positive pairs
                z1, z2 = digit_indices[d][i], digit_indices[d][i + 1]
                pairs += [[x[z1], x[z2]]]
                # use random number to find images from another class to create negative pairs
                inc = random.randrange(1, class_num)
                dn = (d + inc) % class_num
                z1, z2 = digit_indices[d][i], digit_indices[dn][i]
                pairs += [[x[z1], x[z2]]]
                # add two labels which the first one is positive class and the second is negative.
                labels += [1, 0]
        return np.array(pairs), np.array(labels)

 

這里用一個輸入輸出的形式來幫助理解:

輸入: [image1, image2, image3...] [label1, label2, label3...]
輸出: [[[image1, image2], [image1, image102]], [[image2, image3], [image2, image302]]...]   [[0, 1], [0, 1]...]

假設image 1-100 的類別為'貓', image 101-200 的類別為'狗' [[貓,貓], [貓,狗]] 對應一個 [[0, 1]]

# two image  
tr_pairs, tr_y = create_pairs(x_train, y_train_new)  
tr_pairs = tr_pairs.reshape(tr_pairs.shape[0], 2, 28, 28, 1)  
print(tr_pairs.shape)  
 
te_pairs_1, te_y_1 = create_pairs(x_train, y_test_new)  
te_pairs_1 = te_pairs_1.reshape(te_pairs_1.shape[0], 2, 28, 28, 1)  
print(te_pairs_1.shape)  

te_pairs_2, te_y_2 = create_pairs(x_train, y_test_new_2)  
te_pairs_2 = te_pairs_2.reshape(te_pairs_2.shape[0], 2, 28, 28, 1)  
print(te_pairs_2.shape)  

(57588, 2, 28, 28, 1)

(14388, 2, 28, 28, 1)

(47992, 2, 28, 28, 1)

基本網絡構成

基本網絡是一個CNN網絡

首先,我們有一個卷積+relu層和一個更大尺寸的7*7濾波器,然后是一個maxpooling層,它減少了參數以減少計算和過度擬合。然后,還有另一個卷積+relu層,其具有較小尺寸的濾波器3×3。然后,展平層將多維展平到一維,用於隨后的完全連接層。此外,在若干層中,正則化器用於減少過度擬合。

def create_base_network(input_shape):
        '''
        Base network to be shared.
        '''
        input = Input(shape=input_shape)
        x = Conv2D(32, (7, 7), activation='relu', input_shape=input_shape, kernel_regularizer=regularizers.l2(0.01),
                   bias_regularizer=regularizers.l1(0.01))(input)
        x = MaxPooling2D()(x)
        x = Conv2D(64, (3, 3), activation='relu', kernel_regularizer=regularizers.l2(0.01),
                   bias_regularizer=regularizers.l1(0.01))(x)
        x = Flatten()(x)
        x = Dense(128, activation='relu', kernel_regularizer=regularizers.l2(0.01),
                  bias_regularizer=regularizers.l1(0.01))(x)

        return Model(input, x)  

input_shape = (28,28,1)  

base_network = create_base_network(input_shape)  

input_a = Input(shape=input_shape)  
input_b = Input(shape=input_shape)  

# because we re-use the same instance `base_network`,  
# the weights of the network  
# will be shared across the two branches  
processed_a = base_network(input_a)  
processed_b = base_network(input_b)  
print(base_network.summary())  

損失函數

 

# reference from keras example "https://github.com/keras-team/keras/blob/master/examples/mnist_siamese.py"
def euclidean_distance(vects):
    x, y = vects
    sum_square = K.sum(K.square(x - y), axis=1, keepdims=True)
    return K.sqrt(K.maximum(sum_square, K.epsilon()))

def eucl_dist_output_shape(shapes):
    shape1, shape2 = shapes
    return (shape1[0], 1)

def contrastive_loss(y_true, y_pred):
    '''
    Contrastive loss from Hadsell-et-al.'06
    http://yann.lecun.com/exdb/publis/pdf/hadsell-chopra-lecun-06.pdf
    '''
    margin = 1
    square_pred = K.square(y_pred)
    margin_square = K.square(K.maximum(margin - y_pred, 0))
    return K.mean(y_true * square_pred + (1 - y_true) * margin_square)

def compute_accuracy(y_true, y_pred):
    '''
    Compute classification accuracy with a fixed threshold on distances.
    '''
    pred = y_pred.ravel() < 0.5
    return np.mean(pred == y_true)


def accuracy(y_true, y_pred):
    '''
    Compute classification accuracy with a fixed threshold on distances.
    '''
    return K.mean(K.equal(y_true, K.cast(y_pred < 0.5, y_true.dtype)))

 

 

# add a lambda layer  
distance = Lambda(euclidean_distance,  
                  output_shape=eucl_dist_output_shape)([processed_a, processed_b])  

model = Model([input_a, input_b], distance)

模型訓練

現在我們已經完成孿生神經網絡結構,可以開始使用訓練數據集對模型進行訓練。

# train  
epochs = 10
rms = RMSprop()
model.compile(loss
=contrastive_loss, optimizer=rms, metrics=[accuracy]) history = model.fit([tr_pairs[:, 0], tr_pairs[:, 1]], tr_y, batch_size=128, epochs=epochs, validation_data=([te_pairs_1[:, 0], te_pairs_1[:, 1]], te_y_1))

模型預測

 從現在起,這個模型就能夠做出預測。它以測試數據作為輸入。根據預測結果可以評價其准確性。

y_pred = model.predict([tr_pairs[:, 0], tr_pairs[:, 1]])  
tr_acc = compute_accuracy(tr_y, y_pred)  
y_pred = model.predict([te_pairs_1[:, 0], te_pairs_1[:, 1]])  
te_acc = compute_accuracy(te_y_1, y_pred)  

模型評估

1.用參與訓練的類別["top", "trouser", "pullover", "coat", "sandal", "ankle boot"]的測試集來評估網絡

 

 

 

 

* Accuracy on training set: 93.66%

* Accuracy on test set: 93.38%

 

2.用參與訓練的類別["top", "trouser", "pullover", "coat", "sandal", "ankle boot"] 以及未參與訓練的類別["dress", "sneaker", "bag", "shirt"]組成的測試集來評估網絡

* Accuracy on test set: 83.94%

 

3.用未參與訓練的類別["dress", "sneaker", "bag", "shirt"]的測試集來評估網絡

* Accuracy on test set: 74.85%

 

結論

我們可以看出孿生神經網絡對於沒有參與訓練的類別,也有不錯的辨別能力(74.85%),這個泛化能力對於復雜的現實生活,有着廣大的應用前景。


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM