Fork版本項目地址:SSD
一、輸入標簽生成
在數據預處理之后,圖片、類別、真實框格式較為原始,不能夠直接作為損失函數的輸入標簽(ssd向前網絡只需要圖像就行,這里的處理主要需要滿足loss的計算),對於一張圖片(三維CHW)我們需要如下格式的數據作為損失函數標簽:
gclasse: 搜索框對應的真實類別
長度為ssd特征層f的list,每一個元素是一個Tensor,shape為:該層中心點行數×列數×每個中心點包含搜索框數目
gscores: 搜索框和真實框的IOU,gclasses中記錄的就是該真實框的類別
長度為ssd特征層f的list,每一個元素是一個Tensor,shape為:該層中心點行數×列數×每個中心點包含搜索框數目
glocalisations: 搜索框相較於真實框位置修正,由於有4個坐標,所以維度多了一維
長度為ssd特征層f的list,每一個元素是一個Tensor,shape為:該層中心點行數×列數×每個中心點包含搜索框數目×4
為了計算出上面標簽,我們函數調用如下(train_ssd_network.py):
# f層個(m,m,k),f層個(m,m,k,4xywh),f層個(m,m,k) f層表示提取ssd特征的層的數目
# 0-20數字,方便loss的坐標記錄,IOU值
gclasses, glocalisations, gscores = \
ssd_net.bboxes_encode(glabels, gbboxes, ssd_anchors)
輸入變量都是前幾節中的函數輸出(train_ssd_network.py):
ssd_anchors = ssd_net.anchors(ssd_shape) # 調用類方法,創建搜素框
# Pre-processing image, labels and bboxes.
# 'CHW' (n,) (n, 4)
image, glabels, gbboxes = \
image_preprocessing_fn(image, glabels, gbboxes,
out_shape=ssd_shape, # (300,300)
data_format=DATA_FORMAT) # 'NCHW'
至此,我們再來看一看該函數如何實現,其處理過程是按照ssd特征層進行划分,首先建立三個list,然后對於每一個特征層計算該層的三個Tensor,最后分別添加進list中(ssd_common.py):
def tf_ssd_bboxes_encode(labels,
bboxes,
anchors,
num_classes,
no_annotation_label,
ignore_threshold=0.5,
prior_scaling=(0.1, 0.1, 0.2, 0.2),
dtype=tf.float32,
scope='ssd_bboxes_encode'):
with tf.name_scope(scope):
target_labels = []
target_localizations = []
target_scores = []
# anchors_layer: (y, x, h, w)
for i, anchors_layer in enumerate(anchors):
with tf.name_scope('bboxes_encode_block_%i' % i):
# (m,m,k),xywh(m,m,4k),(m,m,k)
t_labels, t_loc, t_scores = \
tf_ssd_bboxes_encode_layer(labels, bboxes, anchors_layer,
num_classes, no_annotation_label,
ignore_threshold,
prior_scaling, dtype)
target_labels.append(t_labels)
target_localizations.append(t_loc)
target_scores.append(t_scores)
return target_labels, target_localizations, target_scores
每一層處理是重點(ssd_common.py),從這里我們可以更深刻體會到所有框體長度信息歸一化的便捷之處——不同層的框體均可以直接和真實框做運算,畢竟它們都是0~1的相對位置:
# 為了有助理解,m表示該層中心點行列數,k為每個中心點對應的框數,n為圖像上的目標數
def tf_ssd_bboxes_encode_layer(labels, # (n,)
bboxes, # (n, 4)
anchors_layer, # y(m, m, 1), x(m, m, 1), h(k,), w(k,)
num_classes,
no_annotation_label,
ignore_threshold=0.5,
prior_scaling=(0.1, 0.1, 0.2, 0.2),
dtype=tf.float32):
"""Encode groundtruth labels and bounding boxes using SSD anchors from
one layer.
Arguments:
labels: 1D Tensor(int64) containing groundtruth labels;
bboxes: Nx4 Tensor(float) with bboxes relative coordinates;
anchors_layer: Numpy array with layer anchors;
matching_threshold: Threshold for positive match with groundtruth bboxes;
prior_scaling: Scaling of encoded coordinates.
Return:
(target_labels, target_localizations, target_scores): Target Tensors.
"""
# Anchors coordinates and volume.
yref, xref, href, wref = anchors_layer # y(m, m, 1), x(m, m, 1), h(k,), w(k,)
ymin = yref - href / 2. # (m, m, k)
xmin = xref - wref / 2.
ymax = yref + href / 2.
xmax = xref + wref / 2.
vol_anchors = (xmax - xmin) * (ymax - ymin) # 搜索框面積(m, m, k)
# Initialize tensors...
# 下面各個Tensor矩陣的shape等於中心點坐標矩陣的shape
shape = (yref.shape[0], yref.shape[1], href.size) # (m, m, k)
feat_labels = tf.zeros(shape, dtype=tf.int64) # (m, m, k)
feat_scores = tf.zeros(shape, dtype=dtype)
feat_ymin = tf.zeros(shape, dtype=dtype)
feat_xmin = tf.zeros(shape, dtype=dtype)
feat_ymax = tf.ones(shape, dtype=dtype)
feat_xmax = tf.ones(shape, dtype=dtype)
def jaccard_with_anchors(bbox):
"""Compute jaccard score between a box and the anchors.
"""
int_ymin = tf.maximum(ymin, bbox[0]) # (m, m, k)
int_xmin = tf.maximum(xmin, bbox[1])
int_ymax = tf.minimum(ymax, bbox[2])
int_xmax = tf.minimum(xmax, bbox[3])
h = tf.maximum(int_ymax - int_ymin, 0.)
w = tf.maximum(int_xmax - int_xmin, 0.)
# Volumes.
# 處理搜索框和bbox之間的聯系
inter_vol = h * w # 交集面積
union_vol = vol_anchors - inter_vol \
+ (bbox[2] - bbox[0]) * (bbox[3] - bbox[1]) # 並集面積
jaccard = tf.div(inter_vol, union_vol) # 交集/並集,即IOU
return jaccard # (m, m, k)
def condition(i, feat_labels, feat_scores,
feat_ymin, feat_xmin, feat_ymax, feat_xmax):
"""Condition: check label index.
"""
r = tf.less(i, tf.shape(labels))
return r[0] # tf.shape(labels)有維度,所以r有維度
def body(i, feat_labels, feat_scores,
feat_ymin, feat_xmin, feat_ymax, feat_xmax):
"""Body: update feature labels, scores and bboxes.
Follow the original SSD paper for that purpose:
- assign values when jaccard > 0.5;
- only update if beat the score of other bboxes.
"""
# Jaccard score.
label = labels[i] # 當前圖片上第i個對象的標簽
bbox = bboxes[i] # 當前圖片上第i個對象的真實框bbox
jaccard = jaccard_with_anchors(bbox) # 當前對象的bbox和當前層的搜索網格IOU,(m, m, k)
# Mask: check threshold + scores + no annotations + num_classes.
mask = tf.greater(jaccard, feat_scores) # 掩碼矩陣,IOU大於歷史得分的為True,(m, m, k)
# mask = tf.logical_and(mask, tf.greater(jaccard, matching_threshold))
mask = tf.logical_and(mask, feat_scores > -0.5)
mask = tf.logical_and(mask, label < num_classes) # 不太懂,label應該必定小於類別數
imask = tf.cast(mask, tf.int64) # 整形mask
fmask = tf.cast(mask, dtype) # 浮點型mask
# Update values using mask.
# 保證feat_labels存儲對應位置得分最大對象標簽,feat_scores存儲那個得分
# (m, m, k) × 當前類別scalar + (1 - (m, m, k)) × (m, m, k)
# 更新label記錄,此時的imask已經保證了True位置當前對像得分高於之前的對象得分,其他位置值不變
feat_labels = imask * label + (1 - imask) * feat_labels
# 更新score記錄,mask為True使用本類別IOU,否則不變
feat_scores = tf.where(mask, jaccard, feat_scores)
# 下面四個矩陣存儲對應label的真實框坐標
# (m, m, k) × 當前框坐標scalar + (1 - (m, m, k)) × (m, m, k)
feat_ymin = fmask * bbox[0] + (1 - fmask) * feat_ymin
feat_xmin = fmask * bbox[1] + (1 - fmask) * feat_xmin
feat_ymax = fmask * bbox[2] + (1 - fmask) * feat_ymax
feat_xmax = fmask * bbox[3] + (1 - fmask) * feat_xmax
return [i+1, feat_labels, feat_scores,
feat_ymin, feat_xmin, feat_ymax, feat_xmax]
# Main loop definition.
# 對當前圖像上每一個目標進行循環
i = 0
(i,
feat_labels, feat_scores,
feat_ymin, feat_xmin,
feat_ymax, feat_xmax) = tf.while_loop(condition, body,
[i,
feat_labels, feat_scores,
feat_ymin, feat_xmin,
feat_ymax, feat_xmax])
# Transform to center / size.
# 這里的y、x、h、w指的是對應位置所屬真實框的相關屬性
feat_cy = (feat_ymax + feat_ymin) / 2.
feat_cx = (feat_xmax + feat_xmin) / 2.
feat_h = feat_ymax - feat_ymin
feat_w = feat_xmax - feat_xmin
# Encode features.
# prior_scaling: [0.1, 0.1, 0.2, 0.2],放縮意義不明
# ((m, m, k) - (m, m, 1)) / (k,) * 10
# 以搜索網格中心點為參考,真實框中心的偏移,單位長度為網格hw
feat_cy = (feat_cy - yref) / href / prior_scaling[0]
feat_cx = (feat_cx - xref) / wref / prior_scaling[1]
# log((m, m, k) / (m, m, 1)) * 5
# 真實框寬高/搜索網格寬高,取對
feat_h = tf.log(feat_h / href) / prior_scaling[2]
feat_w = tf.log(feat_w / wref) / prior_scaling[3]
# Use SSD ordering: x / y / w / h instead of ours.(m, m, k, 4)
feat_localizations = tf.stack([feat_cx, feat_cy, feat_w, feat_h], axis=-1) # -1會擴維,故有4
return feat_labels, feat_localizations, feat_scores
可以看到(最后幾行),feat_localizations用於位置修正記錄,其中存儲的並不是直接的搜索框和真實框的差,而是按照loss函數所需要的格式進行存儲,但是進行prior_scaling處理的意義不明,不過直觀來看對loss函數不構成負面影響(損失函數值依舊是搜索框等於真實框最佳)。
二、處理為batch
生成batch數據隊列
截止到目前,我們的數據都是對單張圖片而言,需要將之整理為batch size的Tensor,不過有點小麻煩,就是我們的數據以list包含Tensor為主,維度擴充需要一點小技巧(tf_utils.py):
def reshape_list(l, shape=None):
"""Reshape list of (list): 1D to 2D or the other way around.
Args:
l: List or List of list.
shape: 1D or 2D shape.
Return
Reshaped list.
"""
r = []
if shape is None:
# Flatten everything.
for a in l:
if isinstance(a, (list, tuple)):
r = r + list(a)
else:
r.append(a)
else:
# Reshape to list of list.
i = 0
for s in shape:
if s == 1:
r.append(l[i])
else:
r.append(l[i:i+s])
i += s
return r
這個函數可以將list1:[Tensor11, [Tensor21, Tensor22, ……], [Ten31, Tensor32, ……], ……]和list2:[Tensor1, Tensor2, ……]這樣的形式相互轉換,需要的就是記錄下list1中各子list長度,單個Tensor記為1(train_ssd_network.py):
batch_shape = [1] + [len(ssd_anchors)] * 3 # (1,f層,f層,f層)
# Training batches and queue.
r = tf.train.batch( # 圖片,中心點類別,真實框坐標,得分
tf_utils.reshape_list([image, gclasses, glocalisations, gscores]),
batch_size=FLAGS.batch_size, # 32
num_threads=FLAGS.num_preprocessing_threads,
capacity=5 * FLAGS.batch_size)
b_image, b_gclasses, b_glocalisations, b_gscores = \
tf_utils.reshape_list(r, batch_shape)
# Intermediate queueing: unique batch computation pipeline for all
# GPUs running the training.
batch_queue = slim.prefetch_queue.prefetch_queue(
tf_utils.reshape_list([b_image, b_gclasses, b_glocalisations, b_gscores]),
capacity=2 * deploy_config.num_clones)
由於tf.train.batch接收輸入格式為[Tensor1, Tensor2, ……],所以要先使用上面函數處理輸入,使單張圖片的標簽數據變化為batch size的標簽數據,再將標簽數據格式變換回來(實際就是把list1化為list2后給其中每一個Tensor加了一個維度,再變換回list1的格式),最后將batch size的Tensor創建隊列,不過沒必要這么麻煩,實際上像下面這么做也不會報錯,省略了來回折騰Tensor的過程……
batch_shape = [1] + [len(ssd_anchors)] * 3 # (1,f層,f層,f層)
r = tf.train.batch( # 圖片,中心點類別,真實框坐標,得分
tf_utils.reshape_list([image, gclasses, glocalisations, gscores]),
batch_size=FLAGS.batch_size, # 32
num_threads=FLAGS.num_preprocessing_threads,
capacity=5 * FLAGS.batch_size)
# Intermediate queueing: unique batch computation pipeline for all
# GPUs running the training.
batch_queue = slim.prefetch_queue.prefetch_queue(
r, # <-----輸入格式實際上並不需要調整
capacity=2 * deploy_config.num_clones)
獲取batch數據隊列
# Dequeue batch.
b_image, b_gclasses, b_glocalisations, b_gscores = \
tf_utils.reshape_list(batch_queue.dequeue(), batch_shape) # 重整list
出隊后整理一下list格式即可,此時獲取的數據格式如下(vgg_300為例):
<tf.Tensor 'batch:0' shape=(32, 3, 300, 300) dtype=float32>
[<tf.Tensor 'batch:1' shape=(32, 38, 38, 4) dtype=int64>,
<tf.Tensor 'batch:2' shape=(32, 19, 19, 6) dtype=int64>,
<tf.Tensor 'batch:3' shape=(32, 10, 10, 6) dtype=int64>,
<tf.Tensor 'batch:4' shape=(32, 5, 5, 6) dtype=int64>,
<tf.Tensor 'batch:5' shape=(32, 3, 3, 4) dtype=int64>,
<tf.Tensor 'batch:6' shape=(32, 1, 1, 4) dtype=int64>]
[<tf.Tensor 'batch:7' shape=(32, 38, 38, 4, 4) dtype=float32>,
<tf.Tensor 'batch:8' shape=(32, 19, 19, 6, 4) dtype=float32>,
<tf.Tensor 'batch:9' shape=(32, 10, 10, 6, 4) dtype=float32>,
<tf.Tensor 'batch:10' shape=(32, 5, 5, 6, 4) dtype=float32>,
<tf.Tensor 'batch:11' shape=(32, 3, 3, 4, 4) dtype=float32>,
<tf.Tensor 'batch:12' shape=(32, 1, 1, 4, 4) dtype=float32>]
[<tf.Tensor 'batch:13' shape=(32, 38, 38, 4) dtype=float32>,
<tf.Tensor 'batch:14' shape=(32, 19, 19, 6) dtype=float32>,
<tf.Tensor 'batch:15' shape=(32, 10, 10, 6) dtype=float32>,
<tf.Tensor 'batch:16' shape=(32, 5, 5, 6) dtype=float32>,
<tf.Tensor 'batch:17' shape=(32, 3, 3, 4) dtype=float32>,
<tf.Tensor 'batch:18' shape=(32, 1, 1, 4) dtype=float32>]
此時的數據格式已經符合loss函數和網絡輸入要求,運行即可:
# Construct SSD network.
# 這個實例方法會返回之前定義的函數ssd_arg_scope(允許修改兩個參數)
arg_scope = ssd_net.arg_scope(weight_decay=FLAGS.weight_decay,
data_format=DATA_FORMAT)
with slim.arg_scope(arg_scope):
# predictions: (BS, H, W, 4, 21)
# localisations: (BS, H, W, 4, 4)
# logits: (BS, H, W, 4, 21)
predictions, localisations, logits, end_points = \
ssd_net.net(b_image, is_training=True)
# Add loss function.
ssd_net.losses(logits, localisations,
b_gclasses, b_glocalisations, b_gscores,
match_threshold=FLAGS.match_threshold, # .5
negative_ratio=FLAGS.negative_ratio, # 3
alpha=FLAGS.loss_alpha, # 1
label_smoothing=FLAGS.label_smoothing) # .0
正向傳播函數會獲取相關的節點,損失函數則會將函數值添加到loss collection中。
