1. 問題

Kaggle上有一個圖像分類比賽Digit Recognizer，數據集是大名鼎鼎的MNIST——圖片是已分割 (image segmented)過的28*28的灰度圖，手寫數字部分對應的是0~255的灰度值，背景部分為0。

from keras.datasets import mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train[0] # .shape = 28*28
"""
[[  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0]
 ...
 [  0   0   0   0   0   0   0   0   0   0   0   0   3  18  18  18 126 136
  175  26 166 255 247 127   0   0   0   0]
 [  0   0   0   0   0   0   0   0  30  36  94 154 170 253 253 253 253 253
  225 172 253 242 195  64   0   0   0   0]
 ...
 [  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
    0   0   0   0   0   0   0   0   0   0]]
"""

手寫數字圖片是長這樣的：

import matplotlib.pyplot as plt

plt.subplot(1, 3, 1)
plt.imshow(x_train[0], cmap='gray')
plt.subplot(1, 3, 2)
plt.imshow(x_train[1], cmap='gray')
plt.subplot(1, 3, 3)
plt.imshow(x_train[2], cmap='gray')
plt.show()

手寫數字識別可以看做是一個圖像分類問題——對二維向量的灰度圖進行分類。

2. 識別

Rodrigo Benenson給出50種方法在MNIST的錯誤率。本文將從傳統方法過渡到深度學習，對比准確率來看。以下代碼基於Python 3.6 + sklearn 0.18.1 + keras 2.0.4。

傳統方法

kNN

思路比較簡單：將二維向量拉直成一個一維向量，基於距離度量以判斷向量間的相似性。顯而易見，這種不帶特征提取的朴素辦法，丟掉了二維向量中最重要的四周相鄰像素的信息。在比較干凈的數據集MNIST還有不錯的表現，准確率為96.927%。此外，kNN模型訓練慢。

from sklearn import neighbors
from sklearn.metrics import precision_score

num_pixels = x_train[0].shape[0] * x_train[0].shape[1]
x_train = x_train.reshape((x_train.shape[0], num_pixels))
x_test = x_test.reshape((x_test.shape[0], num_pixels))

knn = neighbors.KNeighborsClassifier()
knn.fit(x_train, y_train)
pred = knn.predict(x_test)
precision_score(y_test, pred, average='macro') # 0.96927533865705706

MLP
多層感知器MLP (Multi Layer Perceptron)亦即三層的前饋神經網絡，所采用的特征與kNN方法類似——每一個像素點的灰度值對應於輸入層的一個神經元，隱藏層的神經元數為700（一般介於輸入層與輸出層的數量之間）。sklearn的MLPClassifier實現MLP分類，下面給出基於keras的MLP實現。沒怎么細致地調參，准確率大概在98.530%左右。

from keras.layers import Dense
from keras.models import Sequential
from keras.utils import np_utils

# normalization
num_pixels = 28 * 28
x_train = x_train.reshape(x_train.shape[0], num_pixels).astype('float32') / 255
x_test = x_test.reshape(x_test.shape[0], num_pixels).astype('float32') / 255
# one-hot enconder for class
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
num_classes = y_train.shape[1]

model = Sequential([
	Dense(700, input_dim=num_pixels, activation='relu', kernel_initializer='normal'),  # hidden layer
	Dense(num_classes, activation='softmax', kernel_initializer='normal')  # output layer
])
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.summary()

model.fit(x_train, y_train, validation_data=(x_test, y_test), epochs=600, batch_size=200, verbose=2)
model.evaluate(x_test, y_test, verbose=0)  # [0.10381294689745164, 0.98529999999999995]

深度學習

LeCun早在1989年發表的論文 [1]中提出了用CNN (Convolutional Neural Networks)來做手寫數字識別，后來 [2]又改進到Lenet-5，其網絡結構如下圖所示：

卷積、池化、卷積、池化，然后套2個全連接層，最后接個Guassian連接層。眾所周知，CNN自帶特征提取功能，不需要刻意地設計特征提取器。基於keras，Lenet-5 非正式實現如下：

import keras
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Dense, Dropout, Flatten, Activation
from keras.models import Sequential
from keras.utils import np_utils

img_rows, img_cols = 28, 28
# TensorFlow backend: image_data_format() == 'channels_last'
x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1).astype('float32') / 255
x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1).astype('float32') / 255
# one-hot code for class
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
num_classes = y_train.shape[1]

model = Sequential()
model.add(Conv2D(filters=6, kernel_size=(5, 5), padding='valid', input_shape=(28, 28, 1)))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Activation("sigmoid"))

model.add(Conv2D(16, kernel_size=(5, 5), padding='valid'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Activation("sigmoid"))
model.add(Dropout(0.25))
# full connection
model.add(Conv2D(120, kernel_size=(1, 1), padding='valid'))
model.add(Flatten())
# full connection
model.add(Dense(84, activation='sigmoid'))
model.add(Dense(num_classes, activation='softmax'))

model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.SGD(lr=0.08, momentum=0.9),
              metrics=['accuracy'])
model.summary()
model.fit(x_train, y_train, batch_size=32, epochs=8,
          verbose=1, validation_data=(x_test, y_test))
model.evaluate(x_test, y_test, verbose=0)

以上三種方法的准確率如下：

特征	分類器	准確率
gray	kNN	96.927%
gray	3-layer neural networks	98.530%
	Lenet-5	98.640%

3. 參考資料

[1] LeCun, Yann, et al. "Backpropagation applied to handwritten zip code recognition." Neural computation 1.4 (1989): 541-551.
[2] LeCun, Yann, et al. "Gradient-based learning applied to document recognition." Proceedings of the IEEE 86.11 (1998): 2278-2324.
[3] Taylor B. Arnold, Computer vision: LeNet-5, AlexNet, VGG-19, GoogLeNet.

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 深度學習-圖像分類深度學習---MRI醫學圖像分類 [caffe]深度學習之圖像分類模型VGG解讀【深度學習系列】用Tensorflow進行圖像分類【深度學習】keras + tensorflow 實現貓和狗圖像分類無監督深度學習圖像分類思路【paddle學習】圖像分類（動手學深度學習）學習9 圖像分類數據集深度學習面試題17：VGGNet(1000類圖像分類) 基於稀疏表示學習的圖像分類