『TensorFlow』SSD源碼學習_其六：標簽整理

本文轉載自查看原文 2018-07-23 16:59 2617 工程及算法實現/ TensorFlow

Fork版本項目地址：SSD

一、輸入標簽生成

在數據預處理之后，圖片、類別、真實框格式較為原始，不能夠直接作為損失函數的輸入標簽（ssd向前網絡只需要圖像就行，這里的處理主要需要滿足loss的計算），對於一張圖片（三維CHW）我們需要如下格式的數據作為損失函數標簽：

gclasse：           搜索框對應的真實類別

　　　　　　　長度為ssd特征層f的list，每一個元素是一個Tensor，shape為：該層中心點行數×列數×每個中心點包含搜索框數目

gscores：           搜索框和真實框的IOU，gclasses中記錄的就是該真實框的類別

　　　　　　    長度為ssd特征層f的list，每一個元素是一個Tensor，shape為：該層中心點行數×列數×每個中心點包含搜索框數目

glocalisations：搜索框相較於真實框位置修正，由於有4個坐標，所以維度多了一維

　　　　　　　長度為ssd特征層f的list，每一個元素是一個Tensor，shape為：該層中心點行數×列數×每個中心點包含搜索框數目×4

為了計算出上面標簽，我們函數調用如下（train_ssd_network.py）：

            # f層個(m,m,k)，f層個(m,m,k,4xywh)，f層個(m,m,k) f層表示提取ssd特征的層的數目
            # 0-20數字,方便loss的坐標記錄,IOU值
            gclasses, glocalisations, gscores = \
                ssd_net.bboxes_encode(glabels, gbboxes, ssd_anchors)

輸入變量都是前幾節中的函數輸出（train_ssd_network.py）：

ssd_anchors = ssd_net.anchors(ssd_shape)  # 調用類方法，創建搜素框
            
# Pre-processing image, labels and bboxes.
# 'CHW' (n,) (n, 4)
image, glabels, gbboxes = \
        image_preprocessing_fn(image, glabels, gbboxes,
                               out_shape=ssd_shape,  # (300,300)
                               data_format=DATA_FORMAT)  # 'NCHW'

至此，我們再來看一看該函數如何實現，其處理過程是按照ssd特征層進行划分，首先建立三個list，然后對於每一個特征層計算該層的三個Tensor，最后分別添加進list中（ssd_common.py）：

def tf_ssd_bboxes_encode(labels,
                         bboxes,
                         anchors,
                         num_classes,
                         no_annotation_label,
                         ignore_threshold=0.5,
                         prior_scaling=(0.1, 0.1, 0.2, 0.2),
                         dtype=tf.float32,
                         scope='ssd_bboxes_encode'):
    with tf.name_scope(scope):
        target_labels = []
        target_localizations = []
        target_scores = []
        # anchors_layer: (y, x, h, w)
        for i, anchors_layer in enumerate(anchors):
            with tf.name_scope('bboxes_encode_block_%i' % i):
                # (m,m,k)，xywh(m,m,4k)，(m,m,k)
                t_labels, t_loc, t_scores = \
                    tf_ssd_bboxes_encode_layer(labels, bboxes, anchors_layer,
                                               num_classes, no_annotation_label,
                                               ignore_threshold,
                                               prior_scaling, dtype)
                target_labels.append(t_labels)
                target_localizations.append(t_loc)
                target_scores.append(t_scores)
        return target_labels, target_localizations, target_scores

每一層處理是重點（ssd_common.py），從這里我們可以更深刻體會到所有框體長度信息歸一化的便捷之處——不同層的框體均可以直接和真實框做運算，畢竟它們都是0~1的相對位置：

# 為了有助理解，m表示該層中心點行列數，k為每個中心點對應的框數，n為圖像上的目標數
def tf_ssd_bboxes_encode_layer(labels,         # (n,)
                               bboxes,         # (n, 4)
                               anchors_layer,  # y(m, m, 1), x(m, m, 1), h(k,), w(k,)
                               num_classes,
                               no_annotation_label,
                               ignore_threshold=0.5,
                               prior_scaling=(0.1, 0.1, 0.2, 0.2),
                               dtype=tf.float32):
    """Encode groundtruth labels and bounding boxes using SSD anchors from
    one layer.

    Arguments:
      labels: 1D Tensor(int64) containing groundtruth labels;
      bboxes: Nx4 Tensor(float) with bboxes relative coordinates;
      anchors_layer: Numpy array with layer anchors;
      matching_threshold: Threshold for positive match with groundtruth bboxes;
      prior_scaling: Scaling of encoded coordinates.

    Return:
      (target_labels, target_localizations, target_scores): Target Tensors.
    """
    # Anchors coordinates and volume.
    yref, xref, href, wref = anchors_layer  # y(m, m, 1), x(m, m, 1), h(k,), w(k,)
    ymin = yref - href / 2.  # (m, m, k)
    xmin = xref - wref / 2.
    ymax = yref + href / 2.
    xmax = xref + wref / 2.
    vol_anchors = (xmax - xmin) * (ymax - ymin)  # 搜索框面積(m, m, k)

    # Initialize tensors...
    # 下面各個Tensor矩陣的shape等於中心點坐標矩陣的shape
    shape = (yref.shape[0], yref.shape[1], href.size)  # (m, m, k)
    feat_labels = tf.zeros(shape, dtype=tf.int64)  # (m, m, k)
    feat_scores = tf.zeros(shape, dtype=dtype)

    feat_ymin = tf.zeros(shape, dtype=dtype)
    feat_xmin = tf.zeros(shape, dtype=dtype)
    feat_ymax = tf.ones(shape, dtype=dtype)
    feat_xmax = tf.ones(shape, dtype=dtype)

    def jaccard_with_anchors(bbox):
        """Compute jaccard score between a box and the anchors.
        """
        int_ymin = tf.maximum(ymin, bbox[0])  # (m, m, k)
        int_xmin = tf.maximum(xmin, bbox[1])
        int_ymax = tf.minimum(ymax, bbox[2])
        int_xmax = tf.minimum(xmax, bbox[3])
        h = tf.maximum(int_ymax - int_ymin, 0.)
        w = tf.maximum(int_xmax - int_xmin, 0.)
        # Volumes.
        # 處理搜索框和bbox之間的聯系
        inter_vol = h * w  # 交集面積
        union_vol = vol_anchors - inter_vol \
            + (bbox[2] - bbox[0]) * (bbox[3] - bbox[1])  # 並集面積
        jaccard = tf.div(inter_vol, union_vol)  # 交集/並集，即IOU
        return jaccard  # (m, m, k)

    def condition(i, feat_labels, feat_scores,
                  feat_ymin, feat_xmin, feat_ymax, feat_xmax):
        """Condition: check label index.
        """
        r = tf.less(i, tf.shape(labels))
        return r[0]  # tf.shape(labels)有維度，所以r有維度

    def body(i, feat_labels, feat_scores,
             feat_ymin, feat_xmin, feat_ymax, feat_xmax):
        """Body: update feature labels, scores and bboxes.
        Follow the original SSD paper for that purpose:
          - assign values when jaccard > 0.5;
          - only update if beat the score of other bboxes.
        """
        # Jaccard score.
        label = labels[i]  # 當前圖片上第i個對象的標簽
        bbox = bboxes[i]   # 當前圖片上第i個對象的真實框bbox
        jaccard = jaccard_with_anchors(bbox)  # 當前對象的bbox和當前層的搜索網格IOU，(m, m, k)
        # Mask: check threshold + scores + no annotations + num_classes.
        mask = tf.greater(jaccard, feat_scores)  # 掩碼矩陣，IOU大於歷史得分的為True，(m, m, k)
        # mask = tf.logical_and(mask, tf.greater(jaccard, matching_threshold))
        mask = tf.logical_and(mask, feat_scores > -0.5)
        mask = tf.logical_and(mask, label < num_classes)  # 不太懂，label應該必定小於類別數
        imask = tf.cast(mask, tf.int64)  # 整形mask
        fmask = tf.cast(mask, dtype)     # 浮點型mask

        # Update values using mask.
        # 保證feat_labels存儲對應位置得分最大對象標簽，feat_scores存儲那個得分
        # (m, m, k) × 當前類別scalar + (1 - (m, m, k)) × (m, m, k)
        # 更新label記錄，此時的imask已經保證了True位置當前對像得分高於之前的對象得分，其他位置值不變
        feat_labels = imask * label + (1 - imask) * feat_labels
        # 更新score記錄，mask為True使用本類別IOU，否則不變
        feat_scores = tf.where(mask, jaccard, feat_scores)

        # 下面四個矩陣存儲對應label的真實框坐標
        # (m, m, k) × 當前框坐標scalar + (1 - (m, m, k)) × (m, m, k)
        feat_ymin = fmask * bbox[0] + (1 - fmask) * feat_ymin
        feat_xmin = fmask * bbox[1] + (1 - fmask) * feat_xmin
        feat_ymax = fmask * bbox[2] + (1 - fmask) * feat_ymax
        feat_xmax = fmask * bbox[3] + (1 - fmask) * feat_xmax

        return [i+1, feat_labels, feat_scores,
                feat_ymin, feat_xmin, feat_ymax, feat_xmax]
    # Main loop definition.
    # 對當前圖像上每一個目標進行循環
    i = 0
    (i,
     feat_labels, feat_scores,
     feat_ymin, feat_xmin,
     feat_ymax, feat_xmax) = tf.while_loop(condition, body,
                                           [i,
                                            feat_labels, feat_scores,
                                            feat_ymin, feat_xmin,
                                            feat_ymax, feat_xmax])
    # Transform to center / size.
    # 這里的y、x、h、w指的是對應位置所屬真實框的相關屬性
    feat_cy = (feat_ymax + feat_ymin) / 2.
    feat_cx = (feat_xmax + feat_xmin) / 2.
    feat_h = feat_ymax - feat_ymin
    feat_w = feat_xmax - feat_xmin

    # Encode features.
    # prior_scaling: [0.1, 0.1, 0.2, 0.2]，放縮意義不明
    # ((m, m, k) - (m, m, 1)) / (k,) * 10
    # 以搜索網格中心點為參考，真實框中心的偏移，單位長度為網格hw
    feat_cy = (feat_cy - yref) / href / prior_scaling[0]
    feat_cx = (feat_cx - xref) / wref / prior_scaling[1]
    # log((m, m, k) / (m, m, 1)) * 5
    # 真實框寬高/搜索網格寬高，取對
    feat_h = tf.log(feat_h / href) / prior_scaling[2]
    feat_w = tf.log(feat_w / wref) / prior_scaling[3]
    # Use SSD ordering: x / y / w / h instead of ours.(m, m, k, 4)
    feat_localizations = tf.stack([feat_cx, feat_cy, feat_w, feat_h], axis=-1)  # -1會擴維，故有4

    return feat_labels, feat_localizations, feat_scores

可以看到（最后幾行），feat_localizations用於位置修正記錄，其中存儲的並不是直接的搜索框和真實框的差，而是按照loss函數所需要的格式進行存儲，但是進行prior_scaling處理的意義不明，不過直觀來看對loss函數不構成負面影響（損失函數值依舊是搜索框等於真實框最佳）。

二、處理為batch

生成batch數據隊列

截止到目前，我們的數據都是對單張圖片而言，需要將之整理為batch size的Tensor，不過有點小麻煩，就是我們的數據以list包含Tensor為主，維度擴充需要一點小技巧（tf_utils.py）：

def reshape_list(l, shape=None):
    """Reshape list of (list): 1D to 2D or the other way around.

    Args:
      l: List or List of list.
      shape: 1D or 2D shape.
    Return
      Reshaped list.
    """
    r = []
    if shape is None:
        # Flatten everything.
        for a in l:
            if isinstance(a, (list, tuple)):
                r = r + list(a)
            else:
                r.append(a)
    else:
        # Reshape to list of list.
        i = 0
        for s in shape:
            if s == 1:
                r.append(l[i])
            else:
                r.append(l[i:i+s])
            i += s
    return r

這個函數可以將list1：[Tensor11, [Tensor21, Tensor22, ……], [Ten31, Tensor32, ……], ……]和list2：[Tensor1, Tensor2, ……]這樣的形式相互轉換，需要的就是記錄下list1中各子list長度，單個Tensor記為1（train_ssd_network.py）：

            batch_shape = [1] + [len(ssd_anchors)] * 3  # (1,f層,f層,f層)

            # Training batches and queue.
            r = tf.train.batch(  # 圖片，中心點類別，真實框坐標，得分
                tf_utils.reshape_list([image, gclasses, glocalisations, gscores]),
                batch_size=FLAGS.batch_size,  # 32
                num_threads=FLAGS.num_preprocessing_threads,
                capacity=5 * FLAGS.batch_size)

            b_image, b_gclasses, b_glocalisations, b_gscores = \
                tf_utils.reshape_list(r, batch_shape)

            # Intermediate queueing: unique batch computation pipeline for all
            # GPUs running the training.
            batch_queue = slim.prefetch_queue.prefetch_queue(
                tf_utils.reshape_list([b_image, b_gclasses, b_glocalisations, b_gscores]),
                capacity=2 * deploy_config.num_clones)

由於tf.train.batch接收輸入格式為[Tensor1, Tensor2, ……]，所以要先使用上面函數處理輸入，使單張圖片的標簽數據變化為batch size的標簽數據，再將標簽數據格式變換回來（實際就是把list1化為list2后給其中每一個Tensor加了一個維度，再變換回list1的格式），最后將batch size的Tensor創建隊列，不過沒必要這么麻煩，實際上像下面這么做也不會報錯，省略了來回折騰Tensor的過程……

            batch_shape = [1] + [len(ssd_anchors)] * 3 # (1,f層,f層,f層)            
            r = tf.train.batch(  # 圖片，中心點類別，真實框坐標，得分
                tf_utils.reshape_list([image, gclasses, glocalisations, gscores]),
                batch_size=FLAGS.batch_size,  # 32
                num_threads=FLAGS.num_preprocessing_threads,
                capacity=5 * FLAGS.batch_size)

            # Intermediate queueing: unique batch computation pipeline for all
            # GPUs running the training.
            batch_queue = slim.prefetch_queue.prefetch_queue(
                r,                                # <-----輸入格式實際上並不需要調整
                capacity=2 * deploy_config.num_clones)

獲取batch數據隊列

            # Dequeue batch.
            b_image, b_gclasses, b_glocalisations, b_gscores = \
                tf_utils.reshape_list(batch_queue.dequeue(), batch_shape)  # 重整list

出隊后整理一下list格式即可，此時獲取的數據格式如下（vgg_300為例）：

<tf.Tensor 'batch:0' shape=(32, 3, 300, 300) dtype=float32>

[<tf.Tensor 'batch:1' shape=(32, 38, 38, 4) dtype=int64>,
 <tf.Tensor 'batch:2' shape=(32, 19, 19, 6) dtype=int64>,
 <tf.Tensor 'batch:3' shape=(32, 10, 10, 6) dtype=int64>,
 <tf.Tensor 'batch:4' shape=(32, 5, 5, 6) dtype=int64>,
 <tf.Tensor 'batch:5' shape=(32, 3, 3, 4) dtype=int64>,
 <tf.Tensor 'batch:6' shape=(32, 1, 1, 4) dtype=int64>]
 
[<tf.Tensor 'batch:7' shape=(32, 38, 38, 4, 4) dtype=float32>,
 <tf.Tensor 'batch:8' shape=(32, 19, 19, 6, 4) dtype=float32>,
 <tf.Tensor 'batch:9' shape=(32, 10, 10, 6, 4) dtype=float32>,
 <tf.Tensor 'batch:10' shape=(32, 5, 5, 6, 4) dtype=float32>,
 <tf.Tensor 'batch:11' shape=(32, 3, 3, 4, 4) dtype=float32>,
 <tf.Tensor 'batch:12' shape=(32, 1, 1, 4, 4) dtype=float32>]
 
[<tf.Tensor 'batch:13' shape=(32, 38, 38, 4) dtype=float32>,
 <tf.Tensor 'batch:14' shape=(32, 19, 19, 6) dtype=float32>,
 <tf.Tensor 'batch:15' shape=(32, 10, 10, 6) dtype=float32>,
 <tf.Tensor 'batch:16' shape=(32, 5, 5, 6) dtype=float32>,
 <tf.Tensor 'batch:17' shape=(32, 3, 3, 4) dtype=float32>,
 <tf.Tensor 'batch:18' shape=(32, 1, 1, 4) dtype=float32>]

此時的數據格式已經符合loss函數和網絡輸入要求，運行即可：

            # Construct SSD network.
            # 這個實例方法會返回之前定義的函數ssd_arg_scope（允許修改兩個參數）
            arg_scope = ssd_net.arg_scope(weight_decay=FLAGS.weight_decay,
                                          data_format=DATA_FORMAT)
            with slim.arg_scope(arg_scope):
                # predictions: (BS, H, W, 4, 21)
                # localisations: (BS, H, W, 4, 4)
                # logits: (BS, H, W, 4, 21)
                predictions, localisations, logits, end_points = \
                    ssd_net.net(b_image, is_training=True)

            # Add loss function.
            ssd_net.losses(logits, localisations,
                           b_gclasses, b_glocalisations, b_gscores,
                           match_threshold=FLAGS.match_threshold,  # .5
                           negative_ratio=FLAGS.negative_ratio,  # 3
                           alpha=FLAGS.loss_alpha,  # 1
                           label_smoothing=FLAGS.label_smoothing)  # .0

正向傳播函數會獲取相關的節點，損失函數則會將函數值添加到loss collection中。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 『TensorFlow』SSD源碼學習_其八：網絡訓練『TensorFlow』SSD源碼學習_其二：基於VGG的SSD網絡前向架構 SSD-tensorflow源碼閱讀『TensorFlow』SSD源碼學習_其四：數據介紹及TFR文件生成 Tensorflow 版本ssd 算法的學習歷程 tensorflow聯邦學習框架整理 SSD-tensorflow-1 demo ssd源碼解讀(caffe) Vue2.x源碼學習筆記-源碼目錄結構整理 SSD學習筆記