『TensorFlow』SSD源碼學習_其一:論文及開源項目文檔介紹
『TensorFlow』SSD源碼學習_其二:基於VGG的SSD網絡前向架構
『TensorFlow』SSD源碼學習_其四:數據介紹及TFR文件生成
『TensorFlow』SSD源碼學習_其五:TFR數據讀取&數據預處理
為了加深理解,我對SSD項目進行了復現,基於原版,有按照自己理解的修改,
項目見github:SSD_Realization_TensorFlow、SSD_Realization_MXNet
構建思路按照訓練主函數的步驟順序,文末貼了出來,下面我們按照這個順序簡要介紹一下各個流程的重點,想要詳細了解的建議看一看之前的解讀源碼的對應篇章(tf),或者看看李沐博士的ssd介紹視頻(雖然不太詳細,不過結合講義思路很清晰,參見:『MXNet』第十彈_物體檢測SSD)。
一、重點說明
SSD架構主要有四個部分,網絡設計、搜索框設計、學習目標處理、損失函數實現。
網絡設計
重點在於正常前向網絡中挑選出的特征層分別添加兩個卷積出口:分類和回歸出口,用於對應后面的每個搜索框的各個類別得分、以及4個坐標值。
搜索框設計
對應網絡的特征層:每個層有若干搜索框,我們需要搜索框位置形狀信息。對於tf版本我們保存了每個框的中心點以及HW信息,而mx版本我們保存的是左上右下兩個的4個坐標數值,mx更為直觀,但是tf版本節省空間:一組框對應同一個中心點,不過搜索框信息量不大,b無傷大雅。
學習目標處理
個人感覺最為繁瑣,我們需要的的信息包含(此時已經獲得了):一組搜索框(實際上指的是全部搜索框的n4個坐標值),圖片的label、圖片的真實框坐標(對應label數目4),我們需要的就是找到搜索框和真是圖片的標簽聯系,
獲取:
每個搜索框對應的分類(和哪個真實框的IOU最大就選真實框的類別標注給該搜索,也就是說會出現大量的0 class搜索框)
每個搜索框的坐標的回歸目標(同上的尋找方法,空位也為0)
負類掩碼,雖然每張圖片里面通常只有幾個標注的邊框,但SSD會生成大量的錨框。可以想象很多錨框都不會框住感興趣的物體,就是說跟任何對應感興趣物體的表框的IoU都小於某個閾值。這樣就會產生大量的負類錨框,或者說對應標號為0的錨框。對於這類錨框有兩點要考慮的:
1、邊框預測的損失函數不應該包括負類錨框,因為它們並沒有對應的真實邊框
2、因為負類錨框數目可能遠多於其他,我們可以只保留其中的一些。而且是保留那些目前預測最不確信它是負類的,就是對類0預測值排序,選取數值最小的哪一些困難的負類錨框
所以需要使用掩碼,抑制一部分計算出來的loss。
損失函數
可講的不多,按照公式實現即可,重點也在上一步計算出來的掩碼處理損失函數值一步。
二、MXNet訓練主函數
if __name__ == '__main__':
batch_size = 4
ctx = mx.cpu(0)
# ctx = mx.gpu(0)
# box_metric = mx.MAE()
cls_metric = mx.metric.Accuracy()
ssd = ssd_mx.SSDNet()
ssd.initialize(ctx=ctx) # mx.init.Xavier(magnitude=2)
cls_loss = util_mx.FocalLoss()
box_loss = util_mx.SmoothL1Loss()
trainer = mx.gluon.Trainer(ssd.collect_params(),
'sgd', {'learning_rate': 0.01, 'wd': 5e-4})
data = get_iterators(data_shape=304, batch_size=batch_size)
for epoch in range(30):
# reset data iterators and metrics
data.reset()
cls_metric.reset()
# box_metric.reset()
tic = time.time()
for i, batch in enumerate(data):
start_time = time.time()
x = batch.data[0].as_in_context(ctx)
y = batch.label[0].as_in_context(ctx)
# 將-1占位符改為背景標簽0,對應坐標框記錄為[0,0,0,0]
y = nd.where(y < 0, nd.zeros_like(y), y)
with mx.autograd.record():
# anchors, 檢測框坐標,[1,n,4]
# class_preds, 各圖片各檢測框分類情況,[bs,n,num_cls + 1]
# box_preds, 各圖片檢測框坐標預測情況,[bs, n * 4]
anchors, class_preds, box_preds = ssd(x, True)
# box_target, 檢測框的收斂目標,[bs, n * 4]
# box_mask, 隱藏不需要的背景類,[bs, n * 4]
# cls_target, 記錄全檢測框的真實類別,[bs,n]
box_target, box_mask, cls_target = ssd_mx.training_targets(anchors, class_preds, y)
loss1 = cls_loss(class_preds, cls_target)
loss2 = box_loss(box_preds, box_target, box_mask)
loss = loss1 + loss2
loss.backward()
trainer.step(batch_size)
if i % 1 == 0:
duration = time.time() - start_time
examples_per_sec = batch_size / duration
sec_per_batch = float(duration)
format_str = "[*] step %d, loss=%.2f (%.1f examples/sec; %.3f sec/batch)"
print(format_str % (i, nd.sum(loss).asscalar(), examples_per_sec, sec_per_batch))
if i % 500 == 0:
ssd.model.save_parameters('model_mx_{}.params'.format(epoch))
三、TensorFlow訓練主函數
def main():
max_steps = 1500
batch_size = 32
adam_beta1 = 0.9
adam_beta2 = 0.999
opt_epsilon = 1.0
num_epochs_per_decay = 2.0
num_samples_per_epoch = 17125
moving_average_decay = None
tf.logging.set_verbosity(tf.logging.DEBUG)
with tf.Graph().as_default():
# Create global_step.
with tf.device("/device:CPU:0"):
global_step = tf.train.create_global_step()
ssd = SSDNet()
ssd_anchors = ssd.anchors
# tfr解析操作放在GPU下有加速,效果不穩定
dataset = \
tfr_data_process.get_split('./TFR_Data',
'voc2012_*.tfrecord',
num_classes=21,
num_samples=num_samples_per_epoch)
with tf.device("/device:CPU:0"): # 僅CPU支持隊列操作
image, glabels, gbboxes = \
tfr_data_process.tfr_read(dataset)
image, glabels, gbboxes = \
preprocess_img_tf.preprocess_image(image, glabels, gbboxes, out_shape=(300, 300))
gclasses, glocalisations, gscores = \
ssd.bboxes_encode(glabels, gbboxes, ssd_anchors)
batch_shape = [1] + [len(ssd_anchors)] * 3 # (1,f層,f層,f層)
# Training batches and queue.
r = tf.train.batch( # 圖片,中心點類別,真實框坐標,得分
util_tf.reshape_list([image, gclasses, glocalisations, gscores]),
batch_size=batch_size,
num_threads=4,
capacity=5 * batch_size)
batch_queue = slim.prefetch_queue.prefetch_queue(
r, # <-----輸入格式實際上並不需要調整
capacity=2 * 1)
# Dequeue batch.
b_image, b_gclasses, b_glocalisations, b_gscores = \
util_tf.reshape_list(batch_queue.dequeue(), batch_shape) # 重整list
predictions, localisations, logits, end_points = \
ssd.net(b_image, is_training=True, weight_decay=0.00004)
ssd.losses(logits, localisations,
b_gclasses, b_glocalisations, b_gscores,
match_threshold=.5,
negative_ratio=3,
alpha=1,
label_smoothing=.0)
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
# =================================================================== #
# Configure the moving averages.
# =================================================================== #
if moving_average_decay:
moving_average_variables = slim.get_model_variables()
variable_averages = tf.train.ExponentialMovingAverage(
moving_average_decay, global_step)
else:
moving_average_variables, variable_averages = None, None
# =================================================================== #
# Configure the optimization procedure.
# =================================================================== #
with tf.device("/device:CPU:0"): # learning_rate節點使用CPU(不明)
decay_steps = int(num_samples_per_epoch / batch_size * num_epochs_per_decay)
learning_rate = tf.train.exponential_decay(0.01,
global_step,
decay_steps,
0.94, # learning_rate_decay_factor,
staircase=True,
name='exponential_decay_learning_rate')
optimizer = tf.train.AdamOptimizer(
learning_rate,
beta1=adam_beta1,
beta2=adam_beta2,
epsilon=opt_epsilon)
tf.summary.scalar('learning_rate', learning_rate)
if moving_average_decay:
# Update ops executed locally by trainer.
update_ops.append(variable_averages.apply(moving_average_variables))
# Variables to train.
trainable_scopes = None
if trainable_scopes is None:
variables_to_train = tf.trainable_variables()
else:
scopes = [scope.strip() for scope in trainable_scopes.split(',')]
variables_to_train = []
for scope in scopes:
variables = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope)
variables_to_train.extend(variables)
losses = tf.get_collection(tf.GraphKeys.LOSSES)
regularization_losses = tf.get_collection(
tf.GraphKeys.REGULARIZATION_LOSSES)
regularization_loss = tf.add_n(regularization_losses)
loss = tf.add_n(losses)
tf.summary.scalar("loss", loss)
tf.summary.scalar("regularization_loss", regularization_loss)
grad = optimizer.compute_gradients(loss, var_list=variables_to_train)
grad_updates = optimizer.apply_gradients(grad,
global_step=global_step)
update_ops.append(grad_updates)
# update_op = tf.group(*update_ops)
with tf.control_dependencies(update_ops):
total_loss = tf.add_n([loss, regularization_loss])
tf.summary.scalar("total_loss", total_loss)
# =================================================================== #
# Kicks off the training.
# =================================================================== #
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.8)
config = tf.ConfigProto(log_device_placement=False,
gpu_options=gpu_options)
saver = tf.train.Saver(max_to_keep=5,
keep_checkpoint_every_n_hours=1.0,
write_version=2,
pad_step_number=False)
if True:
import os
import time
print('start......')
model_path = './logs'
batch_size = batch_size
with tf.Session(config=config) as sess:
summary = tf.summary.merge_all()
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
writer = tf.summary.FileWriter(model_path, sess.graph)
init_op = tf.group(tf.global_variables_initializer(),
tf.local_variables_initializer())
init_op.run()
for step in range(max_steps):
start_time = time.time()
loss_value = sess.run(total_loss)
# loss_value, summary_str = sess.run([train_tensor, summary_op])
# writer.add_summary(summary_str, step)
duration = time.time() - start_time
if step % 10 == 0:
summary_str = sess.run(summary)
writer.add_summary(summary_str, step)
examples_per_sec = batch_size / duration
sec_per_batch = float(duration)
format_str = "[*] step %d, loss=%.2f (%.1f examples/sec; %.3f sec/batch)"
print(format_str % (step, loss_value, examples_per_sec, sec_per_batch))
# if step % 100 == 0:
# accuracy_step = test_cifar10(sess, training=False)
# acc.append('{:.3f}'.format(accuracy_step))
# print(acc)
if step % 500 == 0 and step != 0:
saver.save(sess, os.path.join(model_path, "ssd_tf.model"), global_step=step)
coord.request_stop()
coord.join(threads)
TensorFlow版本我訓練了5w輪,損失函數如下,

實際上次數不太夠,第一幅圖是原開源代碼放出的訓練模型的檢測結果,第二幅圖是訓練5w次的我的版本的結果,可見訓練次數仍然不太夠(更新,后經分析,圖二效果差原因更大是因為非極大值抑制閾值設置過高,改小的話實際上只有一輛車沒有檢測出來,其他的都很正常)。


實驗室的電腦顯卡夠快(1080Ti),不過散熱實在成問題,跑5w次很不容易了,經常重啟,所以差不多就這樣了,不做更多次數的訓練嘗試了。
更新,這次設定了20w次訓練,實際到17w多次過熱重啟了,采用最新的模型,並修改了NMS閾值為0.5,檢測成功:


