Yolov_3 網絡結構分析


轉自:https://blog.csdn.net/KKKSQJ/article/details/83587138

original

Based on keras-yolov3, understanding of the principle and code details

This article GitHub  source code : https://github.com/qqwweee/keras-yolo3

Yolov3 paper address: https://pjreddie.com/media/files/papers/YOLOv3.pdf

Yolov3 official website: https://pjreddie.com/darknet/yolo/

Recently I was very interested in YOLOV3 and read a lot of information. Made some related projects. So I wrote down some experiences to review the query later.

YOLO, the abbreviation of You Only Look Once, is an object detection algorithm based on Convolutional Neural Network (CNN).

 

Yolo design concept

 

The yolo algorithm as a whole uses CNN to detect end-to-end targets. The process is shown in Figure 1.

                                                                                                  Figure 1

 

Specifically (based on YOLOV3)

1: Enter an image of any size to keep the aspect ratio unchanged, zoom to w or h to 416, and then overwrite the new image on 416*416 as the input to the network. That is, the input of the network is a 416*416, 3-channel RGB picture.

2: Run the network. YOLO's CNN network divides the picture into  S*S  grids (yolov3 multi-scale prediction, output 3 layers, each layer S * S grids, respectively 13*13, 26 * 26, 52 * 52), then each The cell is responsible for detecting the targets whose center points fall within the grid, as shown in Figure 2. Each cell needs to predict  3*(4+1+B) values. If the input picture is divided into  S*S  grids, then the final predicted value of each layer is  the tensor of  S*S*3*(4+1+B) size. B: number of categories (coco set is 80), that is, B=80. 3 is the number of anchorboxes per layer, and 4 is the bounding box size and position (x, y, w, h)1 is the confidence level. 

 3: Through NMS, non-maximum value suppression, filter out box boxes , output box class_boxes and confidence class_box_scores, then generate category information classes, generate final detection data frame, and return

          

                                  Figure 2 Figure 3                                                                                              

 

 YOLOV3 network structure:

 

 

 

Multiscale:

Yolov3 uses multi-scale prediction. [(13*13)(26*26)(52*52)]

• Small scale: (13*13 feature map)

  • The network receives a picture of (416 * 416), downsampling (416 / 2 ˆ 5 = 13) and output (13 * 13) after 5 convolutions of 2 steps.

• Mesoscale: (26*26 feature map)

  • The convolutional layer of the penultimate layer in the small scale is upsampled (x2, up sampling) and added to the last 13x13 size feature map, and output (26*26).

• Large scale: (52*52 feature map)

  • Operation with mesoscale output (52*52)

Benefit: Let the network learn deep and shallow features at the same time, by superimposing the adjacent features of the shallow feature map to different channels (not spatial locations), similar to identity mapping in Resnet. This method superimposes the feature map of 26x26x512 into the feature map of 13x13x2048, and connects with the original deep feature map, which makes the model have fine-grained features and increases the ability to recognize small targets.
 

 

Anchor box:

There are a total of 9 yolov3 anchor boxes, which are obtained by k-means clustering. On the COCO dataset, the nine clusters are: (10*13); (16*30); (33*23); (30*61); (62*45); (59*119); *90); (156*198); (373*326).

Different size feature maps correspond to different sizes of a priori frames.

  • 13*13feature map corresponds to [(116*90), (156*198), (373*326)]
  • 26*26feature map corresponds to [(30*61), (62*45), (59*119)]
  • 52*52feature map corresponds to [(10*13), (16*30), (33*23)]

Reason: The larger the feature map, the smaller the feeling field. The more sensitive it is to small targets, so choose a small anchor box.

          The smaller the feature map, the larger the feeling field. The more sensitive the big target is, so choose the big anchor box.

 

Border prediction:

Prediction tx ty tw th

  • Perform sigmoid on tx and ty, and add the corresponding offset (Cx, Cy below)
  • Exp on th and tw and multiply by the corresponding anchor value
  • Multiply tx, ty, th, tw by the corresponding stride, ie: 416/13, 416 ⁄ 26, 416 ⁄ 52
  • Finally, using sigmoid to sigmoid the Objectness and Classes confidence to get a probability of 0~1, the reason is to replace the previous version of softmax with sigmoid, because softmax will expand the maximum category probability value and suppress other category probability values.

       

(tx, ty): The offset of the target center point relative to the top left corner of the grid at which the point is located, normalized by sigmoid. The value belongs to [0, 1]. As shown in the figure (0.3, 0.4)

(cx, cy): The number of grids in the upper left corner of the grid where the point is different from the top left corner. As shown in Figure (1, 1)

(pw, ph): the side length of the anchor box

(tw,th): predict the width and height of the border

PS: The final frame coordinates are bx, by, bw, bh. The network learning goal is tx, ty, tw, th

 

Loss function LOSS

  • YOLO V3 turns Softmax loss in YOLOV2 into Logistic loss

                                                 This picture is for reference only and is slightly different from YOLOV3

 

    

 

Code interpretation: source code  detection part

 

Usage

  •  Git Clone https://github.com/qqwweee/keras-yolo3
  •  Download yolov3 weights from the YOLO  website
  • Convert the darknet version of the yolo model to Keras model
  • Run YOLO dection

 


      
      
      
              
  1. YOLO類的初始化參數:
  2. class YOLO(object):

      
      
      
              
  1. _defaults = {
  2. "model_path": 'model_data/yolo.h5', #訓練好的模型
  3. "anchors_path": 'model_data/yolo_anchors.txt', # anchor box 9個, 從小到大排列
  4. "classes_path": 'model_data/coco_classes.txt', #類別數
  5. "score" : 0.3, #score 閾值
  6. "iou" : 0.45, #iou 閾值
  7. "model_image_size" : (416, 416), #輸入圖像尺寸
  8. " gpu_num" : 1, #gpu數量
  9. }

      
      
      
              
  1. run yolo_video.py
  2. def detect_img(yolo):
  3. while True:
  4. img = input('Input image filename:') #輸入一張圖片
  5. try:
  6. image = Image.open(img)
  7. except:
  8. print('Open Error! Try again!')
  9. continue
  10. else:
  11. r_image = yolo.detect_image(image) #進入yolo.detect_image 進行檢測
  12. r_image.show()
  13. yolo.close_session()
  14. detect_image()函數在yolo.py第102行
  15. def detect_image(self, image):
  16. start = timer()
  17. if self.model_image_size != (None, None): #判斷圖片是否存在
  18. assert self.model_image_size[0]%32 == 0, 'Multiples of 32 required'
  19. assert self.model_image_size[1]%32 == 0, 'Multiples of 32 required'
  20. #assert斷言語句的語法格式 model_image_size[0][1]指圖像的w和h,且必須是32的整數倍
  21. boxed_image = letterbox_image(image, tuple(reversed(self.model_image_size))) #letterbox_image()定義在utils.py的第20行。輸入參數(圖像 ,(w=416,h=416)),輸出一張使用填充來調整圖像的縱橫比不變的新圖。
  22. else:
  23. new_image_size = (image.width - (image.width % 32),
  24. image.height - (image.height % 32))
  25. boxed_image = letterbox_image(image, new_image_size)
  26. image_data = np.array(boxed_image, dtype='float32')
  27. print(image_data.shape) #(416,416,3)
  28. image_data /= 255. #歸一化
  29. image_data = np.expand_dims(image_data, 0)
  30.   #批量添加一維 -> (1,416,416,3) 為了符合網絡的輸入格式 -> (bitch, w, h, c)
  31. out_boxes, out_scores, out_classes = self.sess.run(
  32. [self.boxes, self.scores, self.classes],
  33.   #目的為了求boxes,scores,classes,具體計算方式定義在generate()函數內。在yolo.py第61行
  34. feed_dict={ #喂參數
  35. self.yolo_model.input: image_data, #圖像數據
  36. self.input_image_shape: [image.size[1], image.size[0]], #圖像尺寸
  37. K.learning_phase(): 0 #學習模式 0:測試模型。 1:訓練模式
  38. })
  39. print('Found {} boxes for {}'.format(len(out_boxes), 'img'))
  40. # 繪制邊框,自動設置邊框寬度,繪制邊框和類別文字,使用Pillow繪圖庫

      
      
      
              
  1.    font = ImageFont.truetype(font='font/FiraMono-Medium.otf',
  2.     size=np.floor(3e-2 * image.size[1] + 0.5).astype('int32')) #字體
  3.      thickness = (image.size[0] + image.size[1]) // 300 #厚度
  4.      for i, c in reversed(list(enumerate(out_classes))):
  5.      predicted_class = self.class_names[c] #類別
  6.      box = out_boxes[i] #框
  7.      score = out_scores[i] #置信度
  8.   label = '{} {:.2f}'.format(predicted_class, score) #標簽
  9.   draw = ImageDraw.Draw(image) #畫圖
  10.   label_size = draw.textsize(label, font)  # 標簽文字
  11.   top, left, bottom, right = box
  12.   top = max(0, np.floor(top + 0.5).astype('int32'))
  13.   left = max(0, np.floor(left + 0.5).astype('int32'))
  14.   bottom = min(image.size[1], np.floor(bottom + 0.5).astype('int32'))
  15.   right = min(image.size[0], np.floor(right + 0.5).astype('int32'))
  16.   print(label, (left, top), (right, bottom)) #邊框
  17.   if top - label_size[1] >= 0: #標簽文字
  18.   text_origin = np.array([left, top - label_size[1]])
  19.   else:
  20.   text_origin = np.array([left, top + 1])
  21.   # My kingdom for a good redistributable image drawing library.
  22.   for i in range(thickness): #畫框
  23.   draw.rectangle(
  24.   [left + i, top + i, right - i, bottom - i],
  25.   outline=self.colors[c])
  26.   draw.rectangle( #文字背景
  27.   [tuple(text_origin), tuple(text_origin + label_size)],
  28.   fill=self.colors[c])
  29.   draw.text(text_origin, label, fill=(0, 0, 0), font=font) #文案
  30.   del draw
  31.   end = timer()
  32.   print(end - start)
  33.   return image
generate()在yolo.py第61行
     
     
     
             

      
      
      
              
  1. def generate(self):
  2. model_path = os.path.expanduser(self.model_path) #獲取model路徑
  3. assert model_path.endswith('.h5'), 'Keras model or weights must be a .h5 file.' #判斷model是否以h5結尾
  4. # Load model, or construct model and load weights.
  5. num_anchors = len(self.anchors) #num_anchors = 9。yolov3有9個先驗框
  6. num_classes = len(self.class_names) #num_cliasses = 80。 #coco集一共80類
  7. is_tiny_version = num_anchors==6 # default setting is_tiny_version = False
  8. try:
  9. self.yolo_model = load_model(model_path, compile=False) #下載model
  10. except:
  11. self.yolo_model = tiny_yolo_body(Input(shape=(None,None,3)), num_anchors//2, num_classes) \
  12. if is_tiny_version else yolo_body(Input(shape=(None,None,3)), num_anchors//3, num_classes)
  13. self.yolo_model.load_weights(self.model_path) # 確保model和anchor classes 對應
  14. else:
  15. assert self.yolo_model.layers[-1].output_shape[-1] == \
  16.   # model.layer[-1]:網絡最后一層輸出。 output_shape[-1]:輸出維度的最后一維。 -> (?,13,13,255)
  17. num_anchors/len(self.yolo_model.output) * (num_classes + 5), \
  18.   #255 = 9/3*(80+5). 9/3:每層特征圖對應3個anchor box 80:80個類別 5:4+1,框的4個值+1個置信度
  19. 'Mismatch between model and given anchor and class sizes'
  20. print('{} model, anchors, and classes loaded.'.format(model_path))

      
      
      
              
  1. # 生成繪制邊框的顏色。
  2. hsv_tuples = [(x / len(self.class_names), 1., 1.)
  3.   #h(色調):x/len(self.class_names) s(飽和度):1.0 v(明亮):1.0
  4. for x in range(len(self.class_names))]
  5. self.colors = list(map(lambda x: colorsys.hsv_to_rgb(*x), hsv_tuples)) #hsv轉換為rgb
  6. self.colors = list(
  7. map(lambda x: (int(x[0] * 255), int(x[1] * 255), int(x[2] * 255)),
  8. self.colors))
  9. #hsv取值范圍在【0,1】,而RBG取值范圍在【0,255】,所以乘上255
  10. np.random.seed(10101) # np.random.seed():產生隨機種子。固定種子為一致的顏色
  11. np.random.shuffle(self.colors) # 調整顏色來裝飾相鄰的類。
  12. np.random.seed(None) #重置種子為默認

 

# Generate output tensor targets for filtered bounding boxes.
self.input_image_shape = K.placeholder(shape=(2, ))      #K.placeholder: placeholder in keras
if self.gpu_num>=2:
    self.yolo_model = multi_gpu_model( Self.yolo_model, gpus=self.gpu_num)
boxes, scores, classes = yolo_eval (self.yolo_model.output, self.anchors,
        len(self.class_names), self.input_image_shape,
        score_threshold=self.score, iou_threshold=self.iou )    #yolo_eval (): evaluation function Yolo
return boxes, scores, classes

 

 


      
      
      
              
  1. def yolo_eval(yolo_outputs, #模型輸出,格式如下【(?,13,13,255)(?,26,26,255)(?,52,52,255)】 ?:bitch size; 13-26-52:多尺度預測; 255:預測值(3*(80+5))
  2. anchors, #[(10,13), (16,30), (33,23), (30,61), (62,45), (59,119), (116,90), (156,198),(373,326)]
  3. num_classes,     # 類別個數,coco集80類
  4. image_shape, #placeholder類型的TF參數,默認(416, 416);
  5. max_boxes=20, #每張圖每類最多檢測到20個框同類別框的IoU閾值,大於閾值的重疊框被刪除,重疊物體較多,則調高閾值,重疊物體較少,則調低閾值
  6. score_threshold=.6, #框置信度閾值,小於閾值的框被刪除,需要的框較多,則調低閾值,需要的框較少,則調高閾值;
  7. iou_threshold=.5): #同類別框的IoU閾值,大於閾值的重疊框被刪除,重疊物體較多,則調高閾值,重疊物體較少,則調低閾值
  8. """Evaluate YOLO model on given input and return filtered boxes."""
  9. num_layers = len(yolo_outputs) #yolo的輸出層數;num_layers = 3 -> 13-26-52
  10. anchor_ mask = [[6,7,8], [3,4,5], [0,1,2]] if num_layers==3 else [[3,4,5], [1,2,3]]
  11.   # default setting #每層分配3個anchor box.如13*13分配到【6,7,8】即【(116,90)(156,198)(373,326)】
  12. input_shape = K.shape(yolo_outputs[0])[1:3] * 32
  13.   #輸入shape(?,13,13,255);即第一維和第二維分別*32 ->13*32=416; input_shape:(416,416)
  14. boxes = []
  15. box_scores = []
  16. for l in range(num_layers):
  17. _boxes, _box_scores = yolo_boxes_and_scores(yolo_outputs[l],
  18. anchors[anchor_mask[l]], num_classes, input_shape, image_shape)
  19. boxes.append(_boxes)
  20. box_scores.append(_box_scores)
  21. boxes = K.concatenate(boxes, axis=0) #K.concatenate:將數據展平 ->(?,4)
  22. box_scores = K.concatenate(box_scores, axis=0) # ->(?,)
  23. mask = box_scores >= score_threshold #MASK掩碼,過濾小於score閾值的值,只保留大於閾值的值
  24. max_boxes_tensor = K.constant(max_boxes, dtype='int32') #最大檢測框數20
  25. boxes_ = []
  26. scores_ = []
  27. classes_ = []
  28. for c in range(num_classes):
  29. # TODO: use keras backend instead of tf.
  30. class_boxes = tf.boolean_mask(boxes, mask[:, c]) #通過掩碼MASK和類別C篩選框boxes
  31. class_box_scores = tf.boolean_mask(box_scores[:, c], mask[:, c]) #通過掩碼MASK和類別C篩選scores
  32. nms_index = tf.image.non_max_suppression( #運行非極大抑制
  33. class_boxes, class_box_scores, max_boxes_tensor, iou_threshold=iou_threshold)
  34. class_boxes = K.gather(class_boxes, nms_index) #K.gather:根據索引nms_index選擇class_boxes
  35. class_box_scores = K.gather(class_box_scores, nms_index) #根據索引nms_index選擇class_box_score)
  36. classes = K.ones_like(class_box_scores, 'int32') * c #計算類的框得分
  37. boxes_.append(class_boxes)
  38. scores_.append(class_box_scores)
  39. classes_.append(classes)
  40. boxes_ = K.concatenate(boxes_, axis=0)
  41.   #K.concatenate().將相同維度的數據連接在一起;把boxes_展平。 -> 變成格式:(?,4); ?:框的個數;4:(x,y,w,h)
  42. scores_ = K.concatenate(scores_, axis=0) #變成格式(?,)
  43. classes_ = K.concatenate(classes_, axis=0) #變成格式(?,)
  44. return boxes_, scores_, classes_
  45. yolo_boxes_and_scores()在model.py的第176行

      
      
      
              
  1. def yolo_boxes_and_scores(feats, anchors, num_classes, input_shape, image_shape):
  2. # feats:輸出的shape,->(?,13,13,255); anchors:每層對應的3個anchor box
  3. # num_classes: 類別數(80); input_shape:(416,416); image_shape:圖像尺寸
  4. '''Process Conv layer output'''
  5. box_xy, box_wh, box_confidence, box_class_probs = yolo_head(feats,
  6. anchors, num_classes, input_shape)
  7. #yolo_head():box_xy是box的中心坐標,(0~1)相對位置;box_wh是box的寬高,(0~1)相對值;
  8. #box_confidence是框中物體置信度;box_class_probs是類別置信度;
  9. boxes = yolo_correct_boxes(box_xy, box_wh, input_shape, image_shape)
  10.   #將box_xy和box_wh的(0~1)相對值,轉換為真實坐標,輸出boxes是(y_min,x_min,y_max,x_max)的值
  11. boxes = K.reshape(boxes, [-1, 4])
  12.   #reshape,將不同網格的值轉換為框的列表。即(?,13,13,3,4)->(?,4) ?:框的數目
  13. box_scores = box_confidence * box_class_probs
  14.   #框的得分=框的置信度*類別置信度
  15. box_scores = K.reshape(box_scores, [-1, num_classes])
  16. #reshape,將框的得分展平,變為(?,80); ?:框的數目
  17. return boxes, box_scores
  18. yolo_head()在model.py的第122行

      
      
      
              
  1. def yolo_head(feats, anchors, num_classes, input_shape, calc_loss=False): #參數同上
  2. """Convert final layer features to bounding box parameters."""
  3. num_anchors = len(anchors) #num_anchors = 3
  4. # Reshape to batch, height, width, num_anchors, box_params.
  5. anchors_tensor = K.reshape(K.constant(anchors), [1, 1, 1, num_anchors, 2]) #reshape ->(1,1,1,3,2)
  6. grid_shape = K.shape(feats)[1:3] # height, width (?,13,13,255) -> (13,13)
  7. #grid_y和grid_x用於生成網格grid,通過arange、reshape、tile的組合, 創建y軸的0~12的組合grid_y,再創建x軸的0~12的組合grid_x,將兩者拼接concatenate,就是grid;
  8. grid_y = K.tile(K.reshape(K.arange(0, stop=grid_shape[0]), [-1, 1, 1, 1]),
  9. [1, grid_shape[1], 1, 1])
  10. grid_x = K.tile(K.reshape(K.arange(0, stop=grid_shape[1]), [1, -1, 1, 1]),
  11. [grid_shape[0], 1, 1, 1])
  12. grid = K.concatenate([grid_x, grid_y])
  13. grid = K.cast(grid, K.dtype(feats)) #K.cast():把grid中值的類型變為和feats中值的類型一樣
  14. feats = K.reshape(
  15. feats, [-1, grid_shape[0], grid_shape[1], num_anchors, num_classes + 5])
  16. #將feats的最后一維展開,將anchors與其他數據(類別數+4個框值+框置信度)分離
  17. # Adjust preditions to each spatial grid point and anchor size.
  18. #xywh的計算公式,tx、ty、tw和th是feats值,而bx、by、bw和bh是輸出值,如下圖
  19. box_xy = (K.sigmoid(feats[..., :2]) + grid) / K.cast(grid_shape[::-1], K.dtype(feats))
  20. box_wh = K.exp(feats[..., 2:4]) * anchors_tensor / K.cast(input_shape[::-1], K.dtype(feats))
  21. box_confidence = K.sigmoid(feats[..., 4:5])
  22. box_class_probs = K.sigmoid(feats[..., 5:])
  23. #sigmoid:σ
  24.   # ...操作符,在Python中,“...”(ellipsis)操作符,表示其他維度不變,只操作最前或最后1維;


      
      
      
              
  1. if calc_loss == True:
  2. return grid, feats, box_xy, box_wh
  3. return box_xy, box_wh, box_confidence, box_class_probs
  4. yolo_correct_boxes()在model.py的第150行

      
      
      
              
  1. def yolo_correct_boxes(box_xy, box_wh, input_shape, image_shape): #得到正確的x,y,w,h
  2. '''Get corrected boxes'''
  3. box_yx = box_xy[..., ::-1] #“::-1”是顛倒數組的值
  4. box_hw = box_wh[..., ::-1]
  5. input_shape = K.cast(input_shape, K.dtype(box_yx))
  6. image_shape = K.cast(image_shape, K.dtype(box_yx))
  7. new_shape = K.round(image_shape * K.min(input_shape/image_shape))
  8. offset = (input_shape-new_shape)/2./input_shape
  9. scale = input_shape/new_shape
  10. box_yx = (box_yx - offset) * scale
  11. box_hw *= scale
  12. box_mins = box_yx - (box_hw / 2.)
  13. box_maxes = box_yx + (box_hw / 2.)
  14. boxes = K.concatenate([
  15. box_mins[..., 0:1], #y_min
  16. box_mins[..., 1:2], #x_min
  17. box_maxes[..., 0:1], #y_max
  18. box_maxes[..., 1:2] #x_max
  19. ])
  20. # Scale boxes back to original image shape.
  21. boxes *= K.concatenate([image_shape, image_shape])
  22. return boxes

 

 

 OK, that's all! Enjoy it!

 

 

reference:

Https://blog.csdn.net/qq_14845119/article/details/80335225

https://www.cnblogs.com/makefile/p/YOLOv3.html

Https://www.colabug.com/4125223.html

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM