LeNet-5是一個較簡單的卷積神經網絡。下圖顯示了其結構:輸入的二維圖像,先經過兩次卷積層到池化層,再經過全連接層,最后使用softmax分類作為輸出層。
(6)數據增強,隨機地從256*256的原始圖像中截取224*224大小的區域(以及水平翻轉的鏡像),相當於增加了2*(256-224)^2=2048倍的數據量。如果沒有數據增強,僅靠原始的數據量,參數眾多的CNN會陷入過擬合中,使用了數據增強后可以大大減輕過擬合,提升泛化能力。進行預測時,則是取圖片的四個角加中間共5個位置,並進行左右翻轉,一共獲得10張圖片,對他們進行預測並對10次結果求均值。同時,AlexNet論文中提到了會對圖像的RGB數據進行PCA處理,並對主成分做一個標准差為0.1的高斯擾動,增加一些噪聲,這個Trick可以讓錯誤率再下降1%。
# -*- coding=UTF-8 -*-
import
tensorflow as tf
# 輸入數據
import
input_data
mnist
=
input_data.read_data_sets(
"/tmp/data/"
, one_hot
=
True
)
# 定義網絡超參數
learning_rate
=
0.001
training_iters
=
200000
batch_size
=
64
display_step
=
20
# 定義網絡參數
n_input
=
784
# 輸入的維度
n_classes
=
10
# 標簽的維度
dropout
=
0.8
# Dropout 的概率
# 占位符輸入
x
=
tf.placeholder(tf.types.float32, [
None
, n_input])
y
=
tf.placeholder(tf.types.float32, [
None
, n_classes])
keep_prob
=
tf.placeholder(tf.types.float32)
# 卷積操作
def
conv2d(name, l_input, w, b):
return
tf.nn.relu(tf.nn.bias_add( \
tf.nn.conv2d(l_input, w, strides
=
[
1
,
1
,
1
,
1
], padding
=
'SAME'
),b) \
, name
=
name)
# 最大下采樣操作
def
max_pool(name, l_input, k):
return
tf.nn.max_pool(l_input, ksize
=
[
1
, k, k,
1
], \
strides
=
[
1
, k, k,
1
], padding
=
'SAME'
, name
=
name)
# 歸一化操作
def
norm(name, l_input, lsize
=
4
):
return
tf.nn.lrn(l_input, lsize, bias
=
1.0
, alpha
=
0.001
/
9.0
, beta
=
0.75
, name
=
name)
# 定義整個網絡
def
alex_net(_X, _weights, _biases, _dropout):
_X
=
tf.reshape(_X, shape
=
[
-
1
,
28
,
28
,
1
])
# 向量轉為矩陣
# 卷積層
conv1
=
conv2d(
'conv1'
, _X, _weights[
'wc1'
], _biases[
'bc1'
])
# 下采樣層
pool1
=
max_pool(
'pool1'
, conv1, k
=
2
)
# 歸一化層
norm1
=
norm(
'norm1'
, pool1, lsize
=
4
)
# Dropout
norm1
=
tf.nn.dropout(norm1, _dropout)
# 卷積
conv2
=
conv2d(
'conv2'
, norm1, _weights[
'wc2'
], _biases[
'bc2'
])
# 下采樣
pool2
=
max_pool(
'pool2'
, conv2, k
=
2
)
# 歸一化
norm2
=
norm(
'norm2'
, pool2, lsize
=
4
)
# Dropout
norm2
=
tf.nn.dropout(norm2, _dropout)
# 卷積
conv3
=
conv2d(
'conv3'
, norm2, _weights[
'wc3'
], _biases[
'bc3'
])
# 下采樣
pool3
=
max_pool(
'pool3'
, conv3, k
=
2
)
# 歸一化
norm3
=
norm(
'norm3'
, pool3, lsize
=
4
)
# Dropout
norm3
=
tf.nn.dropout(norm3, _dropout)
# 全連接層,先把特征圖轉為向量
dense1
=
tf.reshape(norm3, [
-
1
, _weights[
'wd1'
].get_shape().as_list()[
0
]])
dense1
=
tf.nn.relu(tf.matmul(dense1, _weights[
'wd1'
])
+
_biases[
'bd1'
], name
=
'fc1'
)
# 全連接層
dense2
=
tf.nn.relu(tf.matmul(dense1, _weights[
'wd2'
])
+
_biases[
'bd2'
], name
=
'fc2'
)
# Relu activation
# 網絡輸出層
out
=
tf.matmul(dense2, _weights[
'out'
])
+
_biases[
'out'
]
return
out
# 存儲所有的網絡參數
weights
=
{
'wc1'
: tf.Variable(tf.random_normal([
3
,
3
,
1
,
64
])),
'wc2'
: tf.Variable(tf.random_normal([
3
,
3
,
64
,
128
])),
'wc3'
: tf.Variable(tf.random_normal([
3
,
3
,
128
,
256
])),
'wd1'
: tf.Variable(tf.random_normal([
4
*
4
*
256
,
1024
])),
'wd2'
: tf.Variable(tf.random_normal([
1024
,
1024
])),
'out'
: tf.Variable(tf.random_normal([
1024
,
10
]))
}
biases
=
{
'bc1'
: tf.Variable(tf.random_normal([
64
])),
'bc2'
: tf.Variable(tf.random_normal([
128
])),
'bc3'
: tf.Variable(tf.random_normal([
256
])),
'bd1'
: tf.Variable(tf.random_normal([
1024
])),
'bd2'
: tf.Variable(tf.random_normal([
1024
])),
'out'
: tf.Variable(tf.random_normal([n_classes]))
}
# 構建模型
pred
=
alex_net(x, weights, biases, keep_prob)
# 定義損失函數和學習步驟
cost
=
tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, y))
optimizer
=
tf.train.AdamOptimizer(learning_rate
=
learning_rate).minimize(cost)
# 測試網絡
correct_pred
=
tf.equal(tf.argmax(pred,
1
), tf.argmax(y,
1
))
accuracy
=
tf.reduce_mean(tf.cast(correct_pred, tf.float32))
# 初始化所有的共享變量
init
=
tf.initialize_all_variables()
# 開啟一個訓練
with tf.Session() as sess:
sess.run(init)
step
=
1
# Keep training until reach max iterations
while
step
*
batch_size < training_iters:
batch_xs, batch_ys
=
mnist.train.next_batch(batch_size)
# 獲取批數據
sess.run(optimizer, feed_dict
=
{x: batch_xs, y: batch_ys, keep_prob: dropout})
if
step
%
display_step
=
=
0
:
# 計算精度
acc
=
sess.run(accuracy, feed_dict
=
{x: batch_xs, y: batch_ys, keep_prob:
1.
})
# 計算損失值
loss
=
sess.run(cost, feed_dict
=
{x: batch_xs, y: batch_ys, keep_prob:
1.
})
print
"Iter "
+
str
(step
*
batch_size)
+
", Minibatch Loss= "
+
"{:.6f}"
.
format
(loss)
+
", Training Accuracy= "
+
"{:.5f}"
.
format
(acc)
step
+
=
1
print
"Optimization Finished!"
# 計算測試精度
print
"Testing Accuracy:"
, sess.run(accuracy, feed_dict
=
{x: mnist.test.images[:
256
], y: mnist.test.labels[:
256
], keep_prob:
1.
})
以上代碼忽略了部分卷積層,全連接層使用了特定的權重。
VGG
VGG-16和VGG-19取名源自作者所處研究組名(Visual Geometry Group),后面的16 19代表了網絡的深度。
VGG-16/VGG-19 138M參數,ILSVRC 2014的亞軍網絡。
VGG-16結構的基本框架
conv1^2 (64) -> pool1 -> conv2^2 (128) -> pool2 -> conv3^3 (256) -> pool3 -> conv4^3 (512) -> pool4 -> conv5^3 (512) -> pool5 -> fc6 (4096) -> fc7 (4096) -> fc8 (1000) -> softmax。 ^3代表重復3次。
網絡輸入的224×224的圖像。
VGG網絡的特點
(1). 結構簡單,作者將卷積核全部替換為3×3(極少用了1×1);相比於AlexNet 的池化核,VGG全部使用2×2的池化核。
(2). 參數量大,而且大部分的參數集中在全連接層中。網絡名稱中有16表示它有16層conv/fc層。
(3). 合適的網絡初始化和使用批量歸一(batch normalization)層對訓練深層網絡很重要。
(4). VGG-19結構類似於VGG-16,有略好於VGG-16的性能,但VGG-19需要消耗更大的資源,因此實際中VGG-16使用得更多。由於VGG-16網絡結構十分簡單,並且很適合遷移學習,因此至今VGG-16仍在廣泛使用。
def VGG16(images, _dropout, n_cls):
"""
此處權重初始化方式采用的是:
卷積層使用預訓練模型中的參數
全連接層使用xavier類型初始化
"""
conv1_1 = conv(images, 64, 'conv1_1', fineturn=True) #1
conv1_2 = conv(conv1_1, 64, 'conv1_2', fineturn=True) #2
pool1 = maxpool(conv1_2, 'pool1')
conv2_1 = conv(pool1, 128, 'conv2_1', fineturn=True) #3
conv2_2 = conv(conv2_1, 128, 'conv2_2', fineturn=True) #4
pool2 = maxpool(conv2_2, 'pool2')
conv3_1 = conv(pool2, 256, 'conv3_1', fineturn=True) #5
conv3_2 = conv(conv3_1, 256, 'conv3_2', fineturn=True) #6
conv3_3 = conv(conv3_2, 256, 'conv3_3', fineturn=True) #7
pool3 = maxpool(conv3_3, 'pool3')
conv4_1 = conv(pool3, 512, 'conv4_1', fineturn=True) #8
conv4_2 = conv(conv4_1, 512, 'conv4_2', fineturn=True) #9
conv4_3 = conv(conv4_2, 512, 'conv4_3', fineturn=True) #10
pool4 = maxpool(conv4_3, 'pool4')
conv5_1 = conv(pool4, 512, 'conv5_1', fineturn=True) #11
conv5_2 = conv(conv5_1, 512, 'conv5_2', fineturn=True) #12
conv5_3 = conv(conv5_2, 512, 'conv5_3', fineturn=True) #13
pool5 = maxpool(conv5_3, 'pool5')
#因為訓練自己的數據,全連接層最好不要使用預訓練參數
flatten = tf.reshape(pool5, [-1, 7*7*512])
fc6 = fc(flatten, 4096, 'fc6', xavier=True) #14
dropout1 = tf.nn.dropout(fc6, _dropout)
fc7 = fc(dropout1, 4096, 'fc7', xavier=True) #15
dropout2 = tf.nn.dropout(fc7, _dropout)
fc8 = fc(dropout2, n_cls, 'fc8', xavier=True) #16
return fc8