深度學習實踐-物體檢測-faster-RCNN(原理和部分代碼說明) 1.tf.image.resize_and_crop(根據比例取出特征層，進行維度變化) 2.tf.slice(數據切片) 3.x.argsort()(對數據進行排列,返回索引值) 4.np.empty(生成空矩陣) 5.np.meshgrid(生成二維數據) 6.np.where(符合條件的索引) 7.tf.gather取值

本文轉載自查看原文 2019-04-03 19:57 717

1. tf.image.resize_and_crop(net, bbox, 256, [14, 14], name) # 根據bbox的y1,x1,y2,x2獲得net中的位置，將其轉換為14*14，因此為[14, 14, 512], 256表示轉換的個數，最后的維度為[256, 14, 14, 512]

參數說明：net表示輸入的卷積層，bbox表示y1，x1,y2, x2的比例，256表示轉換成多少個，[14, 14]表示轉換的卷積，name表示名字

2. tf.slice(x, [0, 2], [-1, 1]) # 對x進行切片操作，[0, 2]表示0表示行，2表示列，[-1, 1]表示切片的范圍，這里是所有數字

import tensorflow as tf

sess = tf.Session()
x = np.array([[1, 2, 3, 4],
              [2, 3, 4, 5]])
c = tf.squeeze(tf.slice(x, [1, 3], [-1, 1]))
print(sess.run(c))

3.x.argsort()[::-1] 將x進行從大到小的排序，返回的是其索引值

x = np.array([5, 2, 3, 4])

c = x.argsort()[::-1]
print(c)

4. np.empty((5,), dtype=np.float32) 生成空矩陣，這里可以用於生成標簽值

參數說明: (5, ) 表示維度，np.float32表示數據類型

x = np.array([0.1, 0.3, 0.15, 0.5, 0.6])
label = np.empty((5, ), dtype=np.float)
label.fill(-1)
label[x > 0.2] = 1
print(label)

5. np.meshgrid(xx, yy) 生成二維的數據,

參數說明：xx為np.arange(0, 10)這樣生成的數據

xx = np.arange(0, 3)
yy = np.arange(0, 3)
xx, yy = np.meshgrid(xx, yy)
print(xx)
print(yy)
## 
xx = [[0, 1, 2], [0, 1, 2], [0, 1, 2]]
yy = [[0, 0, 0], [1, 1, 1], [2, 2, 2]] 

二維的位置數據為第一個位置為0,1，即xx[0][0]和 yy[0][0]

6. np.where(x>1)[0] # 獲得x大於1的索引值，[0]表示取出數值

x = np.array([0, 1, 2])
print(np.where(x>1)[0])

## [2] 輸出結果

7. tf.gather(x, [0, 3, 5]) 從x中取出索引值為[0, 3, 5]的三個數

參數說明：x表示代取數，[0, 3, 5]表示索引值

import tensorflow as tf

sess = tf.Session()

x = np.arange(0, 10)
c = tf.gather(x, [0, 3, 5])
print(sess.run(c))

### [0, 3, 5] 輸出值

參數說明：np.where表示索引值

faster-RCNN是在原有的faster-RCNN層的基礎上加上了RPN層，RPN層主要是用於進行選框的獲取

基於上面這個圖做一個說明：

1.CNN層是采用訓練好的VGG網絡，

2.VGG卷積后的結果輸入到Region Proposal Network中用於進行，建議框的獲取，文章中的建議框的個數是300

Classification loss 說明：根據anchor與Bouding_box真實框的重疊率，將重疊率大於0.7或者與目標框最大重疊率的anchor的標簽設置為1，小於0.3設置為0，中間的設置為-1，訓練獲得是否存在物體

Bouding-box regression loss ： anchor框與真實框的回歸值差，即這里主要是為了獲得選框的調整系數

dx = (g_centx - e_centx) / ex_width # (真實框中心x - 選框中心x)/ 選框的長

dy = (g_centy - e_centy) / ex_center #(真實框中心y - 選框中心y) / 選框的寬

dw = log(g_width / e_width) # 真實框的長 / 選框的長

dh = log(h_width / e_height) # 真實框的寬 / 選框的寬

3. 將300個預選框輸入，獲得256個正負樣本的邊框位置

classification，根據300個預選框與Bouding_box真實框重疊率，重疊率大於0.75，然后隨機挑選出不高於0.25*256個正樣本的標簽，即2，6，8這種實際的物體標簽值，重疊率較小的標簽值為0。

Bouding-box regression loss ： 256個正負樣本框與真實框的回歸值差，這里同樣獲得的是調整系數

4. 計算256個正負樣本框在原始框中的比例, 使用cv2.image.crop_and_resize() 將VGG輸出的網絡，都轉換為[256, 14, 14, 512]的輸出結果，這一步被稱為ROI pooling

5.將256, 14, 14, 521 接全連接層，最終的全連接層，用於進行label 和 regression的預測

代碼說明：由於代碼量較大，我對代碼的思路做一個簡要說明：

第一步：主要是獲得圖片的信息，如圖片的label，邊框的位置信息，圖片的名稱

第二步：模型的構建

第一步：構建net層，這里使用的是已經訓練好的VGG16的卷積層

第二步：構建rpn，

第一步：構造選框anchor

第二步：將VGG16的卷積層經過一個3*3，在經過1*1，輸出rpn_cls_score即得分值，輸出rpn_bbox_pred即邊框的調整比例

第三步：建立proposals

第一步：獲得300個經過邊框調整的proposals

第一步：使用上面獲得的rpn_bbox_pred對anchor進行調整獲得proposal

第二步：對得分socre進行從大到小的排序，獲得前1200索引，根據索引重新組合proposal和score

第三步：使用nms，去除重疊率大於0.75的邊框值索引，然后取前300個proposal和score

第二步：構造訓練rpn的cls的label 和 rpn_bbox 的label

第一步：篩選邊界內的anchor值

第二步：將構建256個anchor與真實框重疊率大於0.75的標簽設置為1，真實框重疊率小於0.3設置為0，剩下的設置為-1

第三步：將邊界內的anchor與真實邊框做差異分析，求比例，作為rpn_bbox的label

dx = (g_centx - e_centx) / ex_width # (真實框中心x - 選框中心x)/ 選框的長

dy = (g_centy - e_centy) / ex_center #(真實框中心y - 選框中心y) / 選框的寬

dw = log(g_width / e_width) # 真實框的長 / 選框的長

dh = log(h_width / e_height) # 真實框的寬 / 選框的寬

第三步：將300個proposal預選框輸入，構造最終proposal訓練的cls和proposal_bbox的label，同時輸出256個正負樣本的邊框值

第一步：將真實框與預選框做重疊率分析，獲得每個預選框最大的真實框的索引，構造預選框的標簽為最大真實框索引

第二步：獲得預選框與真實框重疊率大於0.75的預選框，但是預選框的值不大於256*0.25，將小於0.5和大於0.1的標簽設置為0，做proposal_cls的label

第三步：將256正負樣本的邊框與真實框做差異分析，求比例，作為proposal_bbox的label

第四步：返回256個的rios,x1,y1,x2,y2

第四步: 構建預測prediction，對256的邊框輸入做最終的類別判別和邊框調整的預測

第一步：使用roi pool，使用x1 / width, x2/width, y1/height, y2/height獲得比例用於獲得部分卷積層，使用tf.image.resize_and_crop()每個邊框調整后的卷積層，輸入為256, 14, 14, 512, 然后使用池化層，輸出的維度為256, 7, 7, 512

第二步：將輸出的結果進行維度變化，適合進行全連接操作，接上3層全連接層，最后一層接輸出層，一個輸出為cls_score, 一個輸出為bbox_pred

第五步：構建損失值函數，將上述的四個損失值進行加和

第六步：進行模型的訓練操作

1.anchor框的生成，由於經過了四次pool，因此倍數相差16倍，構造16*16的面積，然后與倍數相除，求平均值，然后根據倍數獲得w和h的值，然后根據scale，來增加

w和h的值，使用np.meshgrid生成16*16的網格坐標，將anchor與網格坐標進行相加獲得最終的anchor值

代碼：

# 用於生成矩形框anchor
def generate_anchors(base_size=16, ratios=[0.5, 1, 2],
                     scales=2 ** np.arange(3, 6)):
    """
    Generate anchor (reference) windows by enumerating aspect ratios X
    scales wrt a reference (0, 0, 15, 15) window.
    """
    # 每個anchor對應的base_anchor為0, 0, 15, 15
    base_anchor = np.array([1, 1, base_size, base_size]) - 1
    ratio_anchors = _ratio_enum(base_anchor, ratios)
    # scales等於[8, 16, 32]
    anchors = np.vstack([_scale_enum(ratio_anchors[i, :], scales)
                         for i in range(ratio_anchors.shape[0])])
    return anchors


def _whctrs(anchor):
    """
    Return width, height, x center, and y center for an anchor (window).
    """
    # anchor框的w
    w = anchor[2] - anchor[0] + 1
    # anchor框的h
    h = anchor[3] - anchor[1] + 1
    # anchor框的中心位置x
    x_ctr = anchor[0] + 0.5 * (w - 1)
    # anchor框的中心位置y
    y_ctr = anchor[1] + 0.5 * (h - 1)
    return w, h, x_ctr, y_ctr


def _mkanchors(ws, hs, x_ctr, y_ctr):
    """
    Given a vector of widths (ws) and heights (hs) around a center
    (x_ctr, y_ctr), output a set of anchors (windows).
    """
    # 創建achor, 大小為
    ws = ws[:, np.newaxis]
    hs = hs[:, np.newaxis]
    anchors = np.hstack((x_ctr - 0.5 * (ws - 1),
                         y_ctr - 0.5 * (hs - 1),
                         x_ctr + 0.5 * (ws - 1),
                         y_ctr + 0.5 * (hs - 1)))
    return anchors


def _ratio_enum(anchor, ratios):
    """
    Enumerate a set of anchors for each aspect ratio wrt an anchor.
    """
    # 根據x1,y1,x2,y2的坐標生成w, h, x_ctr, y_ctr
    w, h, x_ctr, y_ctr = _whctrs(anchor)
    # 根據w * h 獲得當前anchor的面積
    size = w * h
    # 使用ratio擴大面積
    size_ratios = size / ratios
    # 對面積求平均作為ws
    ws = np.round(np.sqrt(size_ratios))
    # 根據比例獲得hs的大小
    hs = np.round(ws * ratios)
    # 構造anchors值，根據圓心的位置,重新獲得x1,x2,y1,y2
    anchors = _mkanchors(ws, hs, x_ctr, y_ctr)
    return anchors


def _scale_enum(anchor, scales):
    """
    Enumerate a set of anchors for each scale wrt an anchor.
    """
    # 根據anchor，獲得中心點的位置
    w, h, x_ctr, y_ctr = _whctrs(anchor)
    # 將w和h乘以scale的倍數
    ws = w * scales
    hs = h * scales
    # 根據ws,hs,x_ctr,y_tr獲得anchor的x1,y1,x2,y2
    anchors = _mkanchors(ws, hs, x_ctr, y_ctr)
    return anchors

2.nms非極大值抑制，這里需要對score進行排序，然后從第一個開始，獲得第一個與其他的roi，使用np.where找出小於閾值的roi值，重新構造索引值

import numpy as np

def py_cpu_nms(dets, thresh):
    """Pure Python NMS baseline."""
    # 獲得最終預測框的x1, y1, x2, y2
    x1 = dets[:, 0]
    y1 = dets[:, 1]
    x2 = dets[:, 2]
    y2 = dets[:, 3]
    # 獲得每個框的得分值
    scores = dets[:, 4]
    # 獲得面積值
    areas = (x2 - x1 + 1) * (y2 - y1 + 1)
    # 對得分進行一個排序
    order = scores.argsort()[::-1]
   # 構造用於存儲序號
    keep = []
    while order.size > 0:
        # 獲得得分值最大的值
        i = order[0]
        # 添加到keep中
        keep.append(i)
        # 兩兩矩陣，獲得與x1[i]做對比,取較大值
        xx1 = np.maximum(x1[i], x1[order[1:]])
        # 兩兩矩陣，獲得與y1[i]做對比，取較大值
        yy1 = np.maximum(y1[i], y1[order[1:]])
        # 兩兩矩陣,獲得與x2[i]做對比, 取較小值
        xx2 = np.minimum(x2[i], x2[order[1:]])
        # 兩兩矩陣，獲得與y2[i]做對比，取較小值
        yy2 = np.minimum(y2[i], y2[order[1:]])
        # 構造交叉部分的面積乘積
        w = np.maximum(0.0, xx2 - xx1 + 1)
        h = np.maximum(0.0, yy2 - yy1 + 1)
        inter = w * h
        # 使用交叉面積 / (原始面積 + 原始面積1 - 交叉面積)
        ovr = inter / (areas[i] + areas[order[1:]] - inter)
        # 重新組合獲得小於thresh的order值
        inds = np.where(ovr <= thresh)[0]
        order = order[inds + 1]

    return keep

3. ROI_POOl操作，輸入的為256個邊框的位置，使用圖片的height和width，對位置做比例化操作，然后使用tf.image.resize_and_crop(conv, boxs, 256, [14, 14]),

將特征圖上的對應位置轉換為(256, 14, 14, 512)的維度

    def _crop_pool_layer(self, bottom, rois, name):
        # 對輸入的矩形框進行維度的壓縮，即rio-pool
        with tf.variable_scope(name):
            # 標簽值
            batch_ids = tf.squeeze(tf.slice(rois, [0, 0], [-1, 1], name="batch_id"), [1])
            # Get the normalized coordinates of bboxes, 卷積層的維度
            bottom_shape = tf.shape(bottom)
            height = (tf.to_float(bottom_shape[1]) - 1.) * np.float32(self._feat_stride[0])
            width = (tf.to_float(bottom_shape[2]) - 1.) * np.float32(self._feat_stride[0])
            # 獲得x1，y1，x2，y2在圖像上的比例
            x1 = tf.slice(rois, [0, 1], [-1, 1], name="x1") / width
            y1 = tf.slice(rois, [0, 2], [-1, 1], name="y1") / height
            x2 = tf.slice(rois, [0, 3], [-1, 1], name="x2") / width
            y2 = tf.slice(rois, [0, 4], [-1, 1], name="y2") / height
            # Won't be backpropagated to rois anyway, but to save time
            bboxes = tf.stop_gradient(tf.concat([y1, x1, y2, x2], axis=1))
            # 7 * 2 = 14
            pre_pool_size = cfg.FLAGS.roi_pooling_size * 2
            # 每個特征圖中取出x1,x2,y1,y2重構為14*14，一共有512個特征圖，一共有256個框
            crops = tf.image.crop_and_resize(bottom, bboxes, tf.to_int32(batch_ids), [pre_pool_size, pre_pool_size], name="crops")
        # 進行一次池化操作
        return slim.max_pool2d(crops, [2, 2], padding='SAME')

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 深度學習原理與框架-神經網絡-cifar10分類(代碼) 1.np.concatenate(進行數據串接) 2.np.hstack(將數據橫着排列) 3.hasattr(判斷.py文件的函數是否存在) 4.reshape(維度重構) 5.tanspose(維度位置變化) 6.pickle.load(f文件讀入) 7.np.argmax(獲得最大值索引) 8.np.maximum(閾值比較) 深度學習原理與框架-Alexnet(遷移學習代碼) 1.sys.argv[1:](控制台輸入的參數獲取第二個參數開始) 2.tf.split(對數據進行切分操作) 3.tf.concat(對數據進行合並操作) 4.tf.variable_scope(指定w的使用范圍) 5.tf.get_variable(構造和獲得參數) 6.np.load(加載.npy文件) 深度學習原理與框架-神經網絡-線性回歸與神經網絡的效果對比 1.np.c_[將數據進行合並] 2.np.linspace(將數據拆成n等分) 3.np.meshgrid(將一維數據表示為二維的維度) 4.plt.contourf(畫出等高線圖，畫算法邊界) tf.gather，取指定維度多個索引的數據 tf.image.crop_and_resize 深度學習筆記之使用Faster-Rcnn進行目標檢測（原理篇）深度學習筆記之使用Faster-Rcnn進行目標檢測（實踐篇） tf.gather()、tf.gather_nd()、tf.batch_gather()、tf.where()和tf.slice() tf和np操作向量的運算外積 np.minimum()與tf.minimum()的用法