MNIST手寫數字圖片識別(線性回歸、CNN方法的手工及框架實現)(未完待續)


0-Background

作為Deep Learning中的Hello World 項目無論如何都要做一遍的。

代碼地址:Github 練習過程中將持續更新blog及代碼。

第一次寫博客,很多地方可能語言組織不清,請多多提出意見。。謝謝~

0.1 背景知識:

  • Linear regression
  • CNN

LeNet-5
AlexNet
ResNet
VGG

  • 各種regularization方式

0.2 Catalog

1-Prepare

  • Numpy 開源的數值計算庫
  • matplotlib Python 的 2D繪圖庫
  • TensorFlow 開源的人工智能學習系統
  • Keras 基Tensorflow、Theano以及CNTK后端的一個高層神經網絡API

2-MNIST

MNIST作為NIST的一個超集,是一個由來自 250 個不同人手寫的數字構成。其中包含60,000個訓練樣本和10,000個測試樣本。
加載MNIST

import numpy as np
import os
import struct
import matplotlib.pyplot as plt


class load:
    def __init__(self,
                 path='mnist'):
        self.path = path

    def load_mnist(self):
        """Read train and test dataset and labels from path"""

        train_image_path = 'train-images.idx3-ubyte'
        train_label_path = 'train-labels.idx1-ubyte'

        test_image_path = 't10k-images.idx3-ubyte'
        test_label_path = 't10k-labels.idx1-ubyte'

        with open(os.path.join(self.path, train_label_path), 'rb') as labelpath:
            magic, n = struct.unpack('>II', labelpath.read(8))
            labels = np.fromfile(labelpath, dtype=np.uint8)
            train_labels = labels.reshape(len(labels), 1)

        with open(os.path.join(self.path, train_image_path), 'rb') as imgpath:
            magic, num, rows, cols = struct.unpack('>IIII', imgpath.read(16))
            images = np.fromfile(imgpath,
                                 dtype=np.uint8).reshape(len(train_labels), 784)
            train_images = images

        with open(os.path.join(self.path, test_label_path), 'rb') as labelpath:
            magic, n = struct.unpack('>II', labelpath.read(8))
            labels = np.fromfile(labelpath,
                                 dtype=np.uint8)
            test_labels = labels.reshape(len(labels), 1)

        with open(os.path.join(self.path, test_image_path), 'rb') as imgpath:
            magic, num, rows, cols = struct.unpack('>IIII', imgpath.read(16))
            images = np.fromfile(imgpath, dtype=np.uint8).reshape(len(test_labels), 784)
            test_images = images

        return train_images, train_labels, test_images, test_labels


if __name__ == '__main__':
    train_images, train_labels, test_images, test_labels = load().load_mnist()
    print('train_images shape:%s' % str(train_images.shape))
    print('train_labels shape:%s' % str(train_labels.shape))
    print('test_images shape:%s' % str(test_images.shape))
    print('test_labels shape:%s' % str(test_labels.shape))

    np.random.seed(1024)

    trainImage = np.random.randint(60000, size=4)
    testImage = np.random.randint(10000, size=2)

    img1 = train_images[trainImage[0]].reshape(28, 28)
    label1 = train_labels[trainImage[0]]
    img2 = train_images[trainImage[1]].reshape(28, 28)
    label2 = train_labels[trainImage[1]]
    img3 = train_images[trainImage[2]].reshape(28, 28)
    label3 = train_labels[trainImage[2]]
    img4 = train_images[trainImage[3]].reshape(28, 28)
    label4 = train_labels[trainImage[3]]

    img5 = test_images[testImage[0]].reshape(28, 28)
    label5 = test_labels[testImage[0]]
    img6 = test_images[testImage[1]].reshape(28, 28)
    label6 = test_labels[testImage[1]]


    plt.figure(num='mnist', figsize=(2, 3))

    plt.subplot(2, 3, 1)
    plt.title(label1)
    plt.imshow(img1)

    plt.subplot(2, 3, 2)
    plt.title(label2)
    plt.imshow(img2)

    plt.subplot(2, 3, 3)
    plt.title(label3)
    plt.imshow(img3)

    plt.subplot(2, 3, 4)
    plt.title(label4)
    plt.imshow(img4)

    plt.subplot(2, 3, 5)
    plt.title(label5)
    plt.imshow(img5)

    plt.subplot(2, 3, 6)
    plt.title(label6)
    plt.imshow(img6)
    plt.show()

運行得到輸出:

3-LinearRegression

采用線性回歸的方式對MNIST數據集訓練識別。
采用2層網絡,hidden layer具有四個神經元,激活函數分別使用TanhReLu

由於MNIST是一個多分類問題,故輸出層采用Softmax作為激活函數,並使用cross entropy作為Loss Function。

3.1 使用Numpy實現

3.1.1 通過Tran data、label獲取 layer size

Code

def layer_size(X, Y):
    """
    Get number of input and output size, and set hidden layer size
    :param X: input dataset's shape(m, 784)
    :param Y: input labels's shape(m,1)
    :return:
    n_x -- the size of the input layer
    n_h -- the size of the hidden layer
    n_y -- the size of the output layer
    """

    n_x = X.T.shape[0]
    n_h = 4
    n_y = Y.T.shape[0]

    return n_x, n_h, n_y

3.1.2 初始化參數

初始化W1、b1、W2、b2*

W初始化為非0數字

b均初始化為0

Code

def initialize_parameters(n_x, n_h, n_y):
    """
    Initialize parameters
    :param n_x: the size of the input layer
    :param n_h: the size of the hidden layer
    :param n_y: the size of the output layer
    :return: dictionary of parameters
    """

    W1 = np.random.randn(n_h, n_x) * 0.01
    b1 = np.zeros((n_h, 1))
    W2 = np.random.randn(n_y, n_h) * 0.01
    b2 = np.zeros((n_y, 1))

    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2
                  }

    return parameters

3.1.3 Forward Propagation

ReLu采用\((|Z|+Z)/2\)的方式實現

def ReLu(Z):
    return (abs(Z) + Z) / 2
def forward_propagation(X, parameters, activation="tanh"):
    """
    Compute the forword propagation
    :param X: input data (m, n_x)
    :param parameters: parameters from initialize_parameters
    :param activation: activation function name, has "tanh" and "relu"
    :return:
        cache: caches of forword result
        A2: sigmoid output
    """

    X = X.T

    W1 = parameters["W1"]
    b1 = parameters["b1"]
    W2 = parameters["W2"]
    b2 = parameters["b2"]

    Z1 = np.dot(W1, X) + b1
    if activation == "tanh":
        A1 = np.tanh(Z1)
    elif activation == "relu":
        A1 = ReLu(Z1)
    else:
        raise Exception('Activation function is not found!')
    Z2 = np.dot(W2, A1) + b2
    A2 = 1 / (1 + np.exp(-Z2))

    cache = {"Z1": Z1,
             "A1": A1,
             "Z2": Z2,
             "A2": A2}

    return A2, cache

3.1.4 Compute Cost


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM