【源碼解讀】YOLO v3 - 06 測試

本文轉載自查看原文 2020-04-28 16:22 561 YOLOv3/ 源碼解讀

　　在實際預測的過程中，主要包括兩個部分：

輸入圖像的標准化處理
從模型輸出的y1,y2,y3中進行分類和定位

　　雖然會先生成yolo的對象，即預測評估的運算過程。

輸入圖像的處理

1 def detect_img(yolo):
2     path = "VOCdevkit/VOC2007/SegmentationClass/*.jpg"
3     outdir = "./VOCdevkit/VOC2007/SegmentationObject"
4     for jpgfile in glob.glob(path):
5         img = Image.open(jpgfile)
6         img = yolo.detect_image(img)   # (731*575)
7         img.save(os.path.join(outdir, os.path.basename(jpgfile)))
8     yolo.close_session()

　　在代碼的第6行yolo.detect_image(img)中對輸入的圖像進行處理。

1 def detect_image(self, image):

A. 要獲得圖像數據image_data：

判斷設置的輸入尺寸是不是能夠整除32的，
將輸入的圖片等比縮放至設定的model_image_size的大小中（此處為416*416）

　　-- letterbox_image()中，過程與y_true的等比縮放的過程相同（詳見：https://www.cnblogs.com/monologuesmw/p/12794278.html），放在灰片上也會使圖像放置在正中央的位置，只不過此時沒有標定框的事宜。

1 if self.model_image_size != (None, None):   # 判斷是否設定圖像輸入模型的尺寸范圍
2     assert self.model_image_size[0] % 32 == 0, 'Multiples of 32 required'    # 這個是關於模型設定的尺寸的大小是否符合規范，
3     assert self.model_image_size[1] % 32 == 0, 'Multiples of 32 required'    # 而不是圖像的大小（對圖像沒有要求）
4     boxed_image = letterbox_image(image, tuple(reversed(self.model_image_size)))    # 把輸入圖像縮放至規定的方位內
5 else:
6     new_image_size = (image.width - (image.width % 32),
7                       image.height - (image.height % 32))
8     boxed_image = letterbox_image(image, new_image_size)
9 image_data = np.array(boxed_image, dtype='float32')

B. 圖像數據歸一化，並將其維度調整至運算過程中可用的維度

　　擴展數組的形狀，在第0維的位置多1維。即 416*416*3--》1*416*416*3

　　#通過上述的擴展轉換，使得輸入的尺寸符合網絡輸入格式 [batch_size, width, high, channels]

1 image_data /= 255.  # 歸一化了
2 image_data = np.expand_dims(image_data, 0)  # Add batch dimension.

C. 通過喂數據，獲得邊框、類別信息

　　該部分的主干在於yolo初始化的generate過程，這個時候返回的檢測每個物體的圖像僅有一個。（即已經經過了nms等過濾系統）

1 out_boxes, out_scores, out_classes = self.sess.run(
2     [self.boxes, self.scores, self.classes],
3     feed_dict={
4         self.yolo_model.input: image_data,     # 圖像的數據
5         self.input_image_shape: [image.size[1], image.size[0]],    # 原始圖像的寬高
6         K.learning_phase(): 0    # 學習模式， 0： 測試模型， 1： 訓練模型
7     })     # 該部分的目的在於求解generate的過程，

D. 將獲得的邊框，類別、類別概率信息繪制在原圖上

進行字體、厚度的設置

1 font = ImageFont.truetype(font='font/FiraMono-Medium.otf',
2             size=np.floor(3e-2 * image.size[1] + 0.5).astype('int32'))   # 字體
3 thickness = (image.size[0] + image.size[1]) // 300  # 厚度

按照輸出的類別進行循環，並進行繪制

1 for i, c in reversed(list(enumerate(out_classes))):

　　獲得框、類等信息

1 predicted_class = self.class_names[c]  # 預測的類別
2 box = out_boxes[i]   # 框信息
3 score = out_scores[i]   # 框的得分（IOU）

　　獲取預測邊框的尺寸

1 top, left, bottom, right = box
2 top = max(0, np.floor(top + 0.5).astype('int32'))
3 left = max(0, np.floor(left + 0.5).astype('int32'))
4 bottom = min(image.size[1], np.floor(bottom + 0.5).astype('int32'))
5 right = min(image.size[0], np.floor(right + 0.5).astype('int32'))

　　上、左會+0.5 並且向下取整；

　　下、右會+0.5 並且向下取整，再和原圖的寬高對比取最小

1 print(label, (left, top), (right, bottom))    # 邊框坐標

　　根據預測框的位置，調整文本的位置label_size--》[110, 20]

1 if top - label_size[1] >= 0:   # 101-20>=0
2     text_origin = np.array([left, top - label_size[1]])  # 
3 else:
4     text_origin = np.array([left, top + 1])

　　繪制

1 for i in range(thickness):
2     draw.rectangle(
3         [left + i, top + i, right - i, bottom - i],
4         outline=self.colors[c])
5 draw.rectangle(
6     [tuple(text_origin), tuple(text_origin + label_size)],
7     fill=self.colors[c])
8 draw.text(text_origin, label, fill=(0, 0, 0), font=font)

2. 模型預測

　　始於YOLO()類的初始化。

1 detect_img(YOLO())

　　關於模型的預測，全系於代碼第8行的generate中。

1 def __init__(self, **kwargs):
2     self.__dict__.update(self._defaults)  # set up default values
3     self.__dict__.update(kwargs)   # and update with user overrides
4     self.class_names = self._get_class()   # 通過路徑獲取class_names的值
5     self.anchors = self._get_anchors()   # 獲取邊界的寬高， 並將原始數據轉化為兩列
6     self.sess = K.get_session()
7     self.boxes, self.scores, self.classes = self.generate()

　　generate內部包含處理的函數如下思維導圖所示：

yolo_head: 將模型最后一層的輸出轉化為中心點坐標、寬、高、框置信度、類別概率。即那些公式的計算。
yolo_correct_boxes: 轉換成真實圖像上的坐標
yolo_boxes_and_scores:中除了包含兩個函數以外，還會進行框的得分的計算：

　　 框的得分=框的置信度*類別置信度

　　並且返回框的信息和框的得分

yolo_eval：中除了包含上述的函數，還會根據框的得分和閾值進行過濾，並進行NMS抑制被標定多次的框。

　　接下來將從最底層往頂層扒~

A. yolo_head()

1 def yolo_head(feats, anchors, num_classes, input_shape, calc_loss=False):
2     """Convert final layer features to bounding box parameters."""

　　獲得anchors的個數，並初始化一個相關的張量

1 num_anchors = len(anchors)
2 # Reshape to batch, height, width, num_anchors, box_params.
3 anchors_tensor = K.reshape(K.constant(anchors), [1, 1, 1, num_anchors, 2])

　　創建網格

　　這個網格用於后續將偏移量轉化為每個中心點的坐標。

會包含x軸的網格和y軸的網格兩部分
每部分每行都是從0到12的網格數
最后將其合並在一起，組成了一個13*13*1*2的矩陣

1 grid_shape = K.shape(feats)[1:3]  # height, width   獲取網格的尺寸 eg：13*13
2 grid_y = K.tile(K.reshape(K.arange(0, stop=grid_shape[0]), [-1, 1, 1, 1]),
3     [1, grid_shape[1], 1, 1])   # 0~12的 從代碼上看，只有grid_shape[1]能夠體現reshape的這個功能 一個像素一個格子？
4 grid_x = K.tile(K.reshape(K.arange(0, stop=grid_shape[1]), [1, -1, 1, 1]),
5     [grid_shape[0], 1, 1, 1])   #  用於生成網格grid
6 grid = K.concatenate([grid_x, grid_y])
7 grid = K.cast(grid, K.dtype(feats))  #  -》（13,13,1,2）

　　將最后一位展開，網格info與其它分離

1 feats = K.reshape(
2     feats, [-1, grid_shape[0], grid_shape[1], num_anchors, num_classes + 5])

　　進行計算

中心點坐標
寬高
框的置信度
類別概率

1 box_xy = (K.sigmoid(feats[..., :2]) + grid) / K.cast(grid_shape[::-1], K.dtype(feats))
2 box_wh = K.exp(feats[..., 2:4]) * anchors_tensor / K.cast(input_shape[::-1], K.dtype(feats))
3 box_confidence = K.sigmoid(feats[..., 4:5])  # 框置信度
4 box_class_probs = K.sigmoid(feats[..., 5:])   # 類別置信度    # 可以一次性的把所有網格包含的內容都算出來

中心點的坐標是相對於網格尺寸的位置

寬高是相對於416*416尺寸的位置

B. yolo_correct_boxes()

1 def yolo_correct_boxes(box_xy, box_wh, input_shape, image_shape):
2     '''轉化成真實的坐標'''

　　進行坐標的翻轉

1 box_yx = box_xy[..., ::-1]
2  box_hw = box_wh[..., ::-1]

　　獲得圖像經過等比縮放以后的尺寸（沒有灰邊的）--- new_shape

1 input_shape = K.cast(input_shape, K.dtype(box_yx))
2 image_shape = K.cast(image_shape, K.dtype(box_yx))
3 new_shape = K.round(image_shape * K.min(input_shape/image_shape))

將box的中心點、寬高調整至原圖尺寸

1 offset = (input_shape-new_shape)/2./input_shape
2 scale = input_shape/new_shape
3 box_yx = (box_yx - offset) * scale
4 box_hw *= scale

　　將中心點、寬高信息轉化為四個坐標點 xmin ymin xmax ymax

1 box_mins = box_yx - (box_hw / 2.)
2 box_maxes = box_yx + (box_hw / 2.)
3 boxes =  K.concatenate([
4     box_mins[..., 0:1],  # y_min
5     box_mins[..., 1:2],  # x_min
6     box_maxes[..., 0:1],  # y_max
7     box_maxes[..., 1:2]  # x_max
8 ])# 即轉化為真實坐標
9 return boxes     ---- 此處的boxes已經是原圖上的了

C. yolo_boxes_and_scores()

1 def yolo_boxes_and_scores(feats, anchors, num_classes, input_shape, image_shape):
2     '''Process Conv layer output'''  處理卷積層輸出的結果

　　執行A.和B

1 box_xy, box_wh, box_confidence, box_class_probs = yolo_head(feats,
2     anchors, num_classes, input_shape)
3 boxes = yolo_correct_boxes(box_xy, box_wh, input_shape, image_shape)
4 boxes = K.reshape(boxes, [-1, 4])

　　計算框的得分

　　 框的得分=框的置信度*類別概率

1 box_scores = box_confidence * box_class_probs   # 框的得分=框的置信度*類別置信度
2 box_scores = K.reshape(box_scores, [-1, num_classes])
3 return boxes, box_scores   --- 原圖上的boxes  而且是四坐標

D. yolo_eval()

1 def yolo_eval(yolo_outputs,     # 網絡神經元輸出的feature map（last layer）
2               anchors,
3               num_classes,
4               image_shape,
5               max_boxes=20,
6               score_threshold=.6,
7               iou_threshold=.5):
8     """Evaluate YOLO model on given input and return filtered boxes."""

　　相關使用的信息初始化：

1 num_layers = len(yolo_outputs)
2 anchor_mask = [[6,7,8], [3,4,5], [0,1,2]] if num_layers==3 else [[3,4,5], [1,2,3]] # default setting
3 input_shape = K.shape(yolo_outputs[0])[1:3] * 32      # 模型最后一層有三個神經元---> (32,16,8)13-26-52  （下采樣的時候的尺度縮放）
4 boxes = []                                            # 每個神經元會有3個anchor_box
5 box_scores = []

　　input_shape 為 416*416的；

　　逐層獲得的邊框信息和框的分的信息：

1 for l in range(num_layers):
2     _boxes, _box_scores = yolo_boxes_and_scores(yolo_outputs[l],
3         anchors[anchor_mask[l]], num_classes, input_shape, image_shape)   # 處理卷積層的輸出 內含yolo_head
4     boxes.append(_boxes)
5     box_scores.append(_box_scores)    #

　　將boxes 和 box_scores的數據展平：

1 boxes = K.concatenate(boxes, axis=0)   # 將數據展平 (?,4)---四坐標（與標定一樣）
2 box_scores = K.concatenate(box_scores, axis=0)

　　獲得了一個邊界框的得分是否大於閾值的布爾向量，用於將低於閾值的邊界框過濾掉

1 mask = box_scores >= score_threshold     # 對三種尺度的得分進行篩選（閾值） 得到一堆符合條件的布爾值

　　初始化：

1 max_boxes_tensor = K.constant(max_boxes, dtype='int32')
2 boxes_ = []
3 scores_ = []
4 classes_ = []

　　按類進行進一步的框框篩選：

1 for c in range(num_classes):

　　過濾掉一些得分小於閾值的邊框：

1 class_boxes = tf.boolean_mask(boxes, mask[:, c])   # 過濾掉一些得分小於閾值的邊界框
2 class_box_scores = tf.boolean_mask(box_scores[:, c], mask[:, c])

　　進行NMS處理（處理掉標定同一物體的多個標定的框）

1 nms_index = tf.image.non_max_suppression(
2     class_boxes, class_box_scores, max_boxes_tensor, iou_threshold=iou_threshold)  # 運行非最大抑止—用於排出同一目標被標注多次的框

1 class_boxes = K.gather(class_boxes, nms_index)   # 在class_boxes中搜尋nms_index下表的向量
2 class_box_scores = K.gather(class_box_scores, nms_index)
3 classes = K.ones_like(class_box_scores, 'int32') * c  
4 boxes_.append(class_boxes)
5 scores_.append(class_box_scores)
6 classes_.append(classes)

　　從此結束循環

1 boxes_ = K.concatenate(boxes_, axis=0)
2 scores_ = K.concatenate(scores_, axis=0)
3 classes_ = K.concatenate(classes_, axis=0)
4 return boxes_, scores_, classes_

　　generate中包含導入模型以及給不同的類分配一種顏色的框，然后就是yolo_eval().

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 yolo系列之yolo v3【深度解析】目標檢測之YOLO V2 V3 YOLO v1 ~ YOLO v5 論文解讀和實現細節 YOLO-V4源碼詳解 Tensorflow+YOLO v3訓練自己的數據集合 Pytorch從0開始實現YOLO V3指南 part5——設計輸入和輸出的流程 AI大視覺（四) | Yolo v3 如何提高對小目標的檢測效率 SSD源碼解讀——網絡測試簡單使用TensorFlow.js在瀏覽器進行視頻實時目標識別(基於YOLO V3) Pytorch從0開始實現YOLO V3指南 part2——搭建網絡結構層