Yolov_3 網絡結構分析

This article GitHub source code : https://github.com/qqwweee/keras-yolo3

Yolov3 paper address: https://pjreddie.com/media/files/papers/YOLOv3.pdf

Yolov3 official website: https://pjreddie.com/darknet/yolo/

Recently I was very interested in YOLOV3 and read a lot of information. Made some related projects. So I wrote down some experiences to review the query later.

YOLO, the abbreviation of You Only Look Once, is an object detection algorithm based on Convolutional Neural Network (CNN).

Yolo design concept

The yolo algorithm as a whole uses CNN to detect end-to-end targets. The process is shown in Figure 1.

Figure 1

Specifically (based on YOLOV3)

1: Enter an image of any size to keep the aspect ratio unchanged, zoom to w or h to 416, and then overwrite the new image on 416*416 as the input to the network. That is, the input of the network is a 416*416, 3-channel RGB picture.

2: Run the network. YOLO's CNN network divides the picture into S*S grids (yolov3 multi-scale prediction, output 3 layers, each layer S * S grids, respectively 13*13, 26 * 26, 52 * 52), then each The cell is responsible for detecting the targets whose center points fall within the grid, as shown in Figure 2. Each cell needs to predict 3*(4+1+B) values. If the input picture is divided into S*S grids, then the final predicted value of each layer is the tensor of S*S*3*(4+1+B) size. B: number of categories (coco set is 80), that is, B=80. 3 is the number of anchorboxes per layer, and 4 is the bounding box size and position (x, y, w, h)1 is the confidence level.

3: Through NMS, non-maximum value suppression, filter out box boxes , output box class_boxes and confidence class_box_scores, then generate category information classes, generate final detection data frame, and return

Figure 2 Figure 3

YOLOV3 network structure:

Multiscale:

Yolov3 uses multi-scale prediction. [(13*13)(26*26)(52*52)]

• Small scale: (13*13 feature map)

The network receives a picture of (416 * 416), downsampling (416 / 2 ˆ 5 = 13) and output (13 * 13) after 5 convolutions of 2 steps.

• Mesoscale: (26*26 feature map)

The convolutional layer of the penultimate layer in the small scale is upsampled (x2, up sampling) and added to the last 13x13 size feature map, and output (26*26).

• Large scale: (52*52 feature map)

Operation with mesoscale output (52*52)

Benefit: Let the network learn deep and shallow features at the same time, by superimposing the adjacent features of the shallow feature map to different channels (not spatial locations), similar to identity mapping in Resnet. This method superimposes the feature map of 26x26x512 into the feature map of 13x13x2048, and connects with the original deep feature map, which makes the model have fine-grained features and increases the ability to recognize small targets.

Anchor box:

There are a total of 9 yolov3 anchor boxes, which are obtained by k-means clustering. On the COCO dataset, the nine clusters are: (10*13); (16*30); (33*23); (30*61); (62*45); (59*119); *90); (156*198); (373*326).

Different size feature maps correspond to different sizes of a priori frames.

13*13feature map corresponds to [(116*90), (156*198), (373*326)]
26*26feature map corresponds to [(30*61), (62*45), (59*119)]
52*52feature map corresponds to [(10*13), (16*30), (33*23)]

Reason: The larger the feature map, the smaller the feeling field. The more sensitive it is to small targets, so choose a small anchor box.

The smaller the feature map, the larger the feeling field. The more sensitive the big target is, so choose the big anchor box.

Border prediction:

Prediction tx ty tw th

Perform sigmoid on tx and ty, and add the corresponding offset (Cx, Cy below)
Exp on th and tw and multiply by the corresponding anchor value
Multiply tx, ty, th, tw by the corresponding stride, ie: 416/13, 416 ⁄ 26, 416 ⁄ 52
Finally, using sigmoid to sigmoid the Objectness and Classes confidence to get a probability of 0~1, the reason is to replace the previous version of softmax with sigmoid, because softmax will expand the maximum category probability value and suppress other category probability values.

(tx, ty): The offset of the target center point relative to the top left corner of the grid at which the point is located, normalized by sigmoid. The value belongs to [0, 1]. As shown in the figure (0.3, 0.4)

(cx, cy): The number of grids in the upper left corner of the grid where the point is different from the top left corner. As shown in Figure (1, 1)

(pw, ph): the side length of the anchor box

(tw,th): predict the width and height of the border

PS: The final frame coordinates are bx, by, bw, bh. The network learning goal is tx, ty, tw, th

Loss function LOSS

YOLO V3 turns Softmax loss in YOLOV2 into Logistic loss

This picture is for reference only and is slightly different from YOLOV3

Code interpretation: source code detection part

Usage

Git Clone https://github.com/qqwweee/keras-yolo3
Download yolov3 weights from the YOLO website
Convert the darknet version of the yolo model to Keras model
Run YOLO dection


      
      
      
               
                
                 
                  
                 
                 
                 
                   YOLO類的初始化參數： 
                  
                 
                
                 
                  
                 
                 
                 
                   class YOLO(object):


      
      
      
               
                
                 
                  
                 
                 
                 
                   _defaults = { 
                  
                 
                
                 
                  
                 
                 
                 
                   "model_path": 'model_data/yolo.h5', #訓練好的模型 
                  
                 
                
                 
                  
                 
                 
                 
                   "anchors_path": 'model_data/yolo_anchors.txt', # anchor box 9個， 從小到大排列 
                  
                 
                
                 
                  
                 
                 
                 
                   "classes_path": 'model_data/coco_classes.txt', #類別數 
                  
                 
                
                 
                  
                 
                 
                 
                   "score" : 0.3, #score 閾值 
                  
                 
                
                 
                  
                 
                 
                 
                   "iou" : 0.45, #iou 閾值 
                  
                 
                
                 
                  
                 
                 
                 
                   "model_image_size" : (416, 416), #輸入圖像尺寸 
                  
                 
                
                 
                  
                 
                 
                 
                   " 
                  gpu_num" : 1, #gpu數量 
                  
                 
                
                 
                  
                 
                 
                 
                   }


      
      
      
               
                
                 
                  
                 
                 
                 
                   run yolo_video.py 
                  
                 
                
                 
                  
                 
                 
                  
                  
                 
                
                 
                  
                 
                 
                 
                   def detect_img(yolo): 
                  
                 
                
                 
                  
                 
                 
                 
                   while True: 
                  
                 
                
                 
                  
                 
                 
                 
                   img = input('Input image filename:') #輸入一張圖片 
                  
                 
                
                 
                  
                 
                 
                 
                   try: 
                  
                 
                
                 
                  
                 
                 
                 
                   image = Image.open(img) 
                  
                 
                
                 
                  
                 
                 
                 
                   except: 
                  
                 
                
                 
                  
                 
                 
                 
                   print('Open Error! Try again!') 
                  
                 
                
                 
                  
                 
                 
                 
                   continue 
                  
                 
                
                 
                  
                 
                 
                 
                   else: 
                  
                 
                
                 
                  
                 
                 
                 
                   r_image = yolo.detect_image(image) #進入yolo.detect_image 進行檢測 
                  
                 
                
                 
                  
                 
                 
                 
                   r_image.show() 
                  
                 
                
                 
                  
                 
                 
                 
                   yolo.close_session() 
                  
                 
                
                 
                  
                 
                 
                  
                  
                 
                
                 
                  
                 
                 
                  
                  
                 
                
                 
                  
                 
                 
                 
                   detect_image（）函數在yolo.py第102行 
                  
                 
                
                 
                  
                 
                 
                  
                  
                 
                
                 
                  
                 
                 
                 
                   def detect_image(self, image): 
                  
                 
                
                 
                  
                 
                 
                 
                   start = timer() 
                  
                 
                
                 
                  
                 
                 
                  
                  
                 
                
                 
                  
                 
                 
                 
                   if self.model_image_size != (None, None): #判斷圖片是否存在 
                  
                 
                
                 
                  
                 
                 
                 
                   assert self.model_image_size[0]%32 == 0, 'Multiples of 32 required' 
                  
                 
                
                 
                  
                 
                 
                 
                   assert self.model_image_size[1]%32 == 0, 'Multiples of 32 required' 
                  
                 
                
                 
                  
                 
                 
                 
                   #assert斷言語句的語法格式 model_image_size[0][1]指圖像的w和h，且必須是32的整數倍 
                  
                 
                
                 
                  
                 
                 
                  
                  
                 
                
                 
                  
                 
                 
                 
                   boxed_image = letterbox_image(image, tuple(reversed(self.model_image_size))) #letterbox_image()定義在utils.py的第20行。輸入參數（圖像 ,(w=416,h=416)),輸出一張使用填充來調整圖像的縱橫比不變的新圖。 
                  
                 
                
                 
                  
                 
                 
                 
                   else: 
                  
                 
                
                 
                  
                 
                 
                 
                   new_image_size = (image.width - (image.width % 32), 
                  
                 
                
                 
                  
                 
                 
                 
                   image.height - (image.height % 32)) 
                  
                 
                
                 
                  
                 
                 
                 
                   boxed_image = letterbox_image(image, new_image_size) 
                  
                 
                
                 
                  
                 
                 
                 
                   image_data = np.array(boxed_image, dtype='float32') 
                  
                 
                
                 
                  
                 
                 
                 
                   print(image_data.shape) #（416，416,3） 
                  
                 
                
                 
                  
                 
                 
                 
                   image_data /= 255. #歸一化 
                  
                 
                
                 
                  
                 
                 
                 
                   image_data = np.expand_dims(image_data, 0) 
                  
                 
                
                 
                  
                 
                 
                 
                     #批量添加一維 -> (1,416,416,3) 為了符合網絡的輸入格式 -> (bitch, w, h, c) 
                  
                 
                
                 
                  
                 
                 
                  
                  
                 
                
                 
                  
                 
                 
                 
                   out_boxes, out_scores, out_classes = self.sess.run( 
                  
                 
                
                 
                  
                 
                 
                 
                   [self.boxes, self.scores, self.classes], 
                  
                 
                
                 
                  
                 
                 
                 
                     #目的為了求boxes,scores,classes，具體計算方式定義在generate（）函數內。在yolo.py第61行 
                  
                 
                
                 
                  
                 
                 
                 
                   feed_dict={ #喂參數 
                  
                 
                
                 
                  
                 
                 
                 
                   self.yolo_model.input: image_data, #圖像數據 
                  
                 
                
                 
                  
                 
                 
                 
                   self.input_image_shape: [image.size[1], image.size[0]], #圖像尺寸 
                  
                 
                
                 
                  
                 
                 
                 
                   K.learning_phase(): 0 #學習模式 0：測試模型。 1：訓練模式 
                  
                 
                
                 
                  
                 
                 
                 
                   }) 
                  
                 
                
                 
                  
                 
                 
                  
                  
                 
                
                 
                  
                 
                 
                 
                   print('Found {} boxes for {}'.format(len(out_boxes), 'img')) 
                  
                 
                
                 
                  
                 
                 
                  
                  
                 
                
                 
                  
                 
                 
                 
                   # 繪制邊框，自動設置邊框寬度，繪制邊框和類別文字，使用Pillow繪圖庫


      
      
      
               
                
                 
                  
                 
                 
                 
                   　　 font = ImageFont.truetype(font='font/FiraMono-Medium.otf', 
                  
                 
                
                 
                  
                 
                 
                 
                   　　　　size=np.floor(3e-2 * image.size[1] + 0.5).astype('int32')) #字體 
                  
                 
                
                 
                  
                 
                 
                 
                   　　　　 thickness = (image.size[0] + image.size[1]) // 300 #厚度 
                  
                 
                
                 
                  
                 
                 
                  
                  
                 
                
                 
                  
                 
                 
                 
                   　　　　 for i, c in reversed(list(enumerate(out_classes))): 
                  
                 
                
                 
                  
                 
                 
                 
                   　　 　　predicted_class = self.class_names[c] #類別 
                  
                 
                
                 
                  
                 
                 
                 
                   　　　　 box = out_boxes[i] #框 
                  
                 
                
                 
                  
                 
                 
                 
                   　　　　 score = out_scores[i] #置信度 
                  
                 
                
                 
                  
                 
                 
                  
                  
                 
                
                 
                  
                 
                 
                 
                   　　label = '{} {:.2f}'.format(predicted_class, score) #標簽 
                  
                 
                
                 
                  
                 
                 
                 
                   　　draw = ImageDraw.Draw(image) #畫圖 
                  
                 
                
                 
                  
                 
                 
                 
                   　　label_size = draw.textsize(label, font)　　# 標簽文字 
                  
                 
                
                 
                  
                 
                 
                  
                  
                 
                
                 
                  
                 
                 
                 
                   　　top, left, bottom, right = box 
                  
                 
                
                 
                  
                 
                 
                 
                   　　top = max(0, np.floor(top + 0.5).astype('int32')) 
                  
                 
                
                 
                  
                 
                 
                 
                   　　left = max(0, np.floor(left + 0.5).astype('int32')) 
                  
                 
                
                 
                  
                 
                 
                 
                   　　bottom = min(image.size[1], np.floor(bottom + 0.5).astype('int32')) 
                  
                 
                
                 
                  
                 
                 
                 
                   　　right = min(image.size[0], np.floor(right + 0.5).astype('int32')) 
                  
                 
                
                 
                  
                 
                 
                 
                   　　print(label, (left, top), (right, bottom)) #邊框 
                  
                 
                
                 
                  
                 
                 
                  
                  
                 
                
                 
                  
                 
                 
                 
                   　　if top - label_size[1] >= 0: #標簽文字 
                  
                 
                
                 
                  
                 
                 
                 
                   　　text_origin = np.array([left, top - label_size[1]]) 
                  
                 
                
                 
                  
                 
                 
                 
                   　　else: 
                  
                 
                
                 
                  
                 
                 
                 
                   　　text_origin = np.array([left, top + 1]) 
                  
                 
                
                 
                  
                 
                 
                  
                  
                 
                
                 
                  
                 
                 
                 
                   　　# My kingdom for a good redistributable image drawing library. 
                  
                 
                
                 
                  
                 
                 
                 
                   　　for i in range(thickness): #畫框 
                  
                 
                
                 
                  
                 
                 
                 
                   　　draw.rectangle( 
                  
                 
                
                 
                  
                 
                 
                 
                   　　[left + i, top + i, right - i, bottom - i], 
                  
                 
                
                 
                  
                 
                 
                 
                   　　outline=self.colors[c]) 
                  
                 
                
                 
                  
                 
                 
                 
                   　　draw.rectangle( #文字背景 
                  
                 
                
                 
                  
                 
                 
                 
                   　　[tuple(text_origin), tuple(text_origin + label_size)], 
                  
                 
                
                 
                  
                 
                 
                 
                   　　fill=self.colors[c]) 
                  
                 
                
                 
                  
                 
                 
                 
                   　　draw.text(text_origin, label, fill=(0, 0, 0), font=font) #文案 
                  
                 
                
                 
                  
                 
                 
                 
                   　　del draw 
                  
                 
                
                 
                  
                 
                 
                  
                  
                 
                
                 
                  
                 
                 
                 
                   　　end = timer() 
                  
                 
                
                 
                  
                 
                 
                 
                   　　print(end - start) 
                  
                 
                
                 
                  
                 
                 
                 
                   　　return image

generate()在yolo.py第61行


      
      
      
               
                
                 
                  
                 
                 
                 
                   def generate(self): 
                  
                 
                
                 
                  
                 
                 
                 
                   model_path = os.path.expanduser(self.model_path) #獲取model路徑 
                  
                 
                
                 
                  
                 
                 
                 
                   assert model_path.endswith('.h5'), 'Keras model or weights must be a .h5 file.' #判斷model是否以h5結尾 
                  
                 
                
                 
                  
                 
                 
                  
                  
                 
                
                 
                  
                 
                 
                 
                   # Load model, or construct model and load weights. 
                  
                 
                
                 
                  
                 
                 
                 
                   num_anchors = len(self.anchors) #num_anchors = 9。yolov3有9個先驗框 
                  
                 
                
                 
                  
                 
                 
                 
                   num_classes = len(self.class_names) #num_cliasses = 80。 #coco集一共80類 
                  
                 
                
                 
                  
                 
                 
                 
                   is_tiny_version = num_anchors==6 # default setting is_tiny_version = False 
                  
                 
                
                 
                  
                 
                 
                 
                   try: 
                  
                 
                
                 
                  
                 
                 
                 
                   self.yolo_model = load_model(model_path, compile=False) #下載model 
                  
                 
                
                 
                  
                 
                 
                 
                   except: 
                  
                 
                
                 
                  
                 
                 
                 
                   self.yolo_model = tiny_yolo_body(Input(shape=(None,None,3)), num_anchors//2, num_classes) \ 
                  
                 
                
                 
                  
                 
                 
                 
                   if is_tiny_version else yolo_body(Input(shape=(None,None,3)), num_anchors//3, num_classes) 
                  
                 
                
                 
                  
                 
                 
                 
                   self.yolo_model.load_weights(self.model_path) # 確保model和anchor classes 對應 
                  
                 
                
                 
                  
                 
                 
                 
                   else: 
                  
                 
                
                 
                  
                 
                 
                 
                   assert self.yolo_model.layers[-1].output_shape[-1] == \ 
                  
                 
                
                 
                  
                 
                 
                 
                     # model.layer[-1]:網絡最后一層輸出。 output_shape[-1]:輸出維度的最后一維。 -> (?,13,13,255) 
                  
                 
                
                 
                  
                 
                 
                 
                   num_anchors/len(self.yolo_model.output) * (num_classes + 5), \ 
                  
                 
                
                 
                  
                 
                 
                 
                     #255 = 9/3*(80+5). 9/3:每層特征圖對應3個anchor box 80:80個類別 5:4+1,框的4個值+1個置信度 
                  
                 
                
                 
                  
                 
                 
                 
                   'Mismatch between model and given anchor and class sizes' 
                  
                 
                
                 
                  
                 
                 
                  
                  
                 
                
                 
                  
                 
                 
                 
                   print('{} model, anchors, and classes loaded.'.format(model_path))


      
      
      
               
                
                 
                  
                 
                 
                 
                   # 生成繪制邊框的顏色。 
                  
                 
                
                 
                  
                 
                 
                 
                   hsv_tuples = [(x / len(self.class_names), 1., 1.) 
                  
                 
                
                 
                  
                 
                 
                 
                     #h(色調）：x/len(self.class_names) s(飽和度）：1.0 v(明亮）：1.0 
                  
                 
                
                 
                  
                 
                 
                 
                   for x in range(len(self.class_names))] 
                  
                 
                
                 
                  
                 
                 
                 
                   self.colors = list(map(lambda x: colorsys.hsv_to_rgb(*x), hsv_tuples)) #hsv轉換為rgb 
                  
                 
                
                 
                  
                 
                 
                 
                   self.colors = list( 
                  
                 
                
                 
                  
                 
                 
                 
                   map(lambda x: (int(x[0] * 255), int(x[1] * 255), int(x[2] * 255)), 
                  
                 
                
                 
                  
                 
                 
                 
                   self.colors)) 
                  
                 
                
                 
                  
                 
                 
                 
                   #hsv取值范圍在【0,1】，而RBG取值范圍在【0,255】，所以乘上255 
                  
                 
                
                 
                  
                 
                 
                 
                   np.random.seed(10101) # np.random.seed():產生隨機種子。固定種子為一致的顏色 
                  
                 
                
                 
                  
                 
                 
                 
                   np.random.shuffle(self.colors) # 調整顏色來裝飾相鄰的類。 
                  
                 
                
                 
                  
                 
                 
                 
                   np.random.seed(None) #重置種子為默認

# Generate output tensor targets for filtered bounding boxes.
self.input_image_shape = K.placeholder(shape=(2, ))      #K.placeholder: placeholder in keras
if self.gpu_num>=2:
    self.yolo_model = multi_gpu_model( Self.yolo_model, gpus=self.gpu_num)
boxes, scores, classes = yolo_eval (self.yolo_model.output, self.anchors,
        len(self.class_names), self.input_image_shape,
        score_threshold=self.score, iou_threshold=self.iou )    #yolo_eval (): evaluation function Yolo
return boxes, scores, classes


      
      
      
               
                
                 
                  
                 
                 
                 
                   def yolo_eval(yolo_outputs, #模型輸出，格式如下【（?，13,13,255）（?，26,26,255）（?,52,52,255）】 ?:bitch size; 13-26-52:多尺度預測； 255：預測值（3*（80+5）） 
                  
                 
                
                 
                  
                 
                 
                 
                   anchors, #[(10,13), (16,30), (33,23), (30,61), (62,45), (59,119), (116,90), (156,198),(373,326)] 
                  
                 
                
                 
                  
                 
                 
                 
                   num_classes,　　　　 # 類別個數，coco集80類 
                  
                 
                
                 
                  
                 
                 
                 
                   image_shape, #placeholder類型的TF參數，默認(416, 416)； 
                  
                 
                
                 
                  
                 
                 
                 
                   max_boxes=20, #每張圖每類最多檢測到20個框同類別框的IoU閾值，大於閾值的重疊框被刪除，重疊物體較多，則調高閾值，重疊物體較少，則調低閾值 
                  
                 
                
                 
                  
                 
                 
                 
                   score_threshold=.6, #框置信度閾值，小於閾值的框被刪除，需要的框較多，則調低閾值，需要的框較少，則調高閾值； 
                  
                 
                
                 
                  
                 
                 
                 
                   iou_threshold=.5): #同類別框的IoU閾值，大於閾值的重疊框被刪除，重疊物體較多，則調高閾值，重疊物體較少，則調低閾值 
                  
                 
                
                 
                  
                 
                 
                  
                  
                 
                
                 
                  
                 
                 
                 
                   """Evaluate YOLO model on given input and return filtered boxes.""" 
                  
                 
                
                 
                  
                 
                 
                  
                  
                 
                
                 
                  
                 
                 
                 
                   num_layers = len(yolo_outputs) #yolo的輸出層數；num_layers = 3 -> 13-26-52 
                  
                 
                
                 
                  
                 
                 
                  
                  
                 
                
                 
                  
                 
                 
                 
                   anchor_ 
                  mask = [[6,7,8], [3,4,5], [0,1,2]] if num_layers==3 else [[3,4,5], [1,2,3]] 
                  
                 
                
                 
                  
                 
                 
                 
                     # default setting #每層分配3個anchor box.如13*13分配到【6,7,8】即【（116,90）（156,198）（373,326）】 
                  
                 
                
                 
                  
                 
                 
                  
                  
                 
                
                 
                  
                 
                 
                 
                   input_shape = K.shape(yolo_outputs[0])[1:3] * 32 
                  
                 
                
                 
                  
                 
                 
                 
                     #輸入shape(?,13,13,255);即第一維和第二維分別*32 ->13*32=416; input_shape:(416,416) 
                  
                 
                
                 
                  
                 
                 
                  
                  
                 
                
                 
                  
                 
                 
                 
                   boxes = [] 
                  
                 
                
                 
                  
                 
                 
                 
                   box_scores = [] 
                  
                 
                
                 
                  
                 
                 
                 
                   for l in range(num_layers): 
                  
                 
                
                 
                  
                 
                 
                 
                   _boxes, _box_scores = yolo_boxes_and_scores(yolo_outputs[l], 
                  
                 
                
                 
                  
                 
                 
                 
                   anchors[anchor_mask[l]], num_classes, input_shape, image_shape) 
                  
                 
                
                 
                  
                 
                 
                 
                   boxes.append(_boxes) 
                  
                 
                
                 
                  
                 
                 
                 
                   box_scores.append(_box_scores) 
                  
                 
                
                 
                  
                 
                 
                 
                   boxes = K.concatenate(boxes, axis=0) #K.concatenate:將數據展平 ->(?,4) 
                  
                 
                
                 
                  
                 
                 
                 
                   box_scores = K.concatenate(box_scores, axis=0) # ->(?,) 
                  
                 
                
                 
                  
                 
                 
                  
                  
                 
                
                 
                  
                 
                 
                 
                   mask = box_scores >= score_threshold #MASK掩碼，過濾小於score閾值的值，只保留大於閾值的值 
                  
                 
                
                 
                  
                 
                 
                 
                   max_boxes_tensor = K.constant(max_boxes, dtype='int32') #最大檢測框數20 
                  
                 
                
                 
                  
                 
                 
                 
                   boxes_ = [] 
                  
                 
                
                 
                  
                 
                 
                 
                   scores_ = [] 
                  
                 
                
                 
                  
                 
                 
                 
                   classes_ = [] 
                  
                 
                
                 
                  
                 
                 
                 
                   for c in range(num_classes): 
                  
                 
                
                 
                  
                 
                 
                 
                   # TODO: use keras backend instead of tf. 
                  
                 
                
                 
                  
                 
                 
                 
                   class_boxes = tf.boolean_mask(boxes, mask[:, c]) #通過掩碼MASK和類別C篩選框boxes 
                  
                 
                
                 
                  
                 
                 
                 
                   class_box_scores = tf.boolean_mask(box_scores[:, c], mask[:, c]) #通過掩碼MASK和類別C篩選scores 
                  
                 
                
                 
                  
                 
                 
                 
                   nms_index = tf.image.non_max_suppression( #運行非極大抑制 
                  
                 
                
                 
                  
                 
                 
                 
                   class_boxes, class_box_scores, max_boxes_tensor, iou_threshold=iou_threshold) 
                  
                 
                
                 
                  
                 
                 
                 
                   class_boxes = K.gather(class_boxes, nms_index) #K.gather:根據索引nms_index選擇class_boxes 
                  
                 
                
                 
                  
                 
                 
                 
                   class_box_scores = K.gather(class_box_scores, nms_index) #根據索引nms_index選擇class_box_score) 
                  
                 
                
                 
                  
                 
                 
                 
                   classes = K.ones_like(class_box_scores, 'int32') * c #計算類的框得分 
                  
                 
                
                 
                  
                 
                 
                 
                   boxes_.append(class_boxes) 
                  
                 
                
                 
                  
                 
                 
                 
                   scores_.append(class_box_scores) 
                  
                 
                
                 
                  
                 
                 
                 
                   classes_.append(classes) 
                  
                 
                
                 
                  
                 
                 
                  
                  
                 
                
                 
                  
                 
                 
                 
                   boxes_ = K.concatenate(boxes_, axis=0) 
                  
                 
                
                 
                  
                 
                 
                 
                     #K.concatenate().將相同維度的數據連接在一起；把boxes_展平。 -> 變成格式:(?,4); ?:框的個數；4：（x,y,w,h） 
                  
                 
                
                 
                  
                 
                 
                  
                  
                 
                
                 
                  
                 
                 
                 
                   scores_ = K.concatenate(scores_, axis=0) #變成格式（?,） 
                  
                 
                
                 
                  
                 
                 
                 
                   classes_ = K.concatenate(classes_, axis=0) #變成格式（?,） 
                  
                 
                
                 
                  
                 
                 
                  
                  
                 
                
                 
                  
                 
                 
                 
                   return boxes_, scores_, classes_ 
                  
                 
                
                 
                  
                 
                 
                  
                  
                 
                
                 
                  
                 
                 
                  
                  
                 
                
                 
                  
                 
                 
                  
                  
                 
                
                 
                  
                 
                 
                  
                  
                 
                
                 
                  
                 
                 
                 
                   yolo_boxes_and_scores()在model.py的第176行


      
      
      
               
                
                 
                  
                 
                 
                 
                   def yolo_boxes_and_scores(feats, anchors, num_classes, input_shape, image_shape): 
                  
                 
                
                 
                  
                 
                 
                 
                   # feats:輸出的shape，->(?,13,13,255); anchors:每層對應的3個anchor box 
                  
                 
                
                 
                  
                 
                 
                 
                   # num_classes: 類別數（80）; input_shape:（416,416）; image_shape:圖像尺寸 
                  
                 
                
                 
                  
                 
                 
                  
                  
                 
                
                 
                  
                 
                 
                 
                   '''Process Conv layer output''' 
                  
                 
                
                 
                  
                 
                 
                  
                  
                 
                
                 
                  
                 
                 
                 
                   box_xy, box_wh, box_confidence, box_class_probs = yolo_head(feats, 
                  
                 
                
                 
                  
                 
                 
                 
                   anchors, num_classes, input_shape) 
                  
                 
                
                 
                  
                 
                 
                 
                   #yolo_head():box_xy是box的中心坐標，(0~1)相對位置；box_wh是box的寬高，(0~1)相對值； 
                  
                 
                
                 
                  
                 
                 
                 
                   #box_confidence是框中物體置信度；box_class_probs是類別置信度； 
                  
                 
                
                 
                  
                 
                 
                  
                  
                 
                
                 
                  
                 
                 
                 
                   boxes = yolo_correct_boxes(box_xy, box_wh, input_shape, image_shape) 
                  
                 
                
                 
                  
                 
                 
                 
                     #將box_xy和box_wh的(0~1)相對值，轉換為真實坐標，輸出boxes是(y_min,x_min,y_max,x_max)的值 
                  
                 
                
                 
                  
                 
                 
                  
                  
                 
                
                 
                  
                 
                 
                 
                   boxes = K.reshape(boxes, [-1, 4]) 
                  
                 
                
                 
                  
                 
                 
                 
                     #reshape,將不同網格的值轉換為框的列表。即（?,13,13,3,4）->(?,4) ？：框的數目 
                  
                 
                
                 
                  
                 
                 
                  
                  
                 
                
                 
                  
                 
                 
                 
                   box_scores = box_confidence * box_class_probs 
                  
                 
                
                 
                  
                 
                 
                 
                     #框的得分=框的置信度*類別置信度 
                  
                 
                
                 
                  
                 
                 
                  
                  
                 
                
                 
                  
                 
                 
                 
                   box_scores = K.reshape(box_scores, [-1, num_classes]) 
                  
                 
                
                 
                  
                 
                 
                 
                   #reshape,將框的得分展平，變為(?,80); ?:框的數目 
                  
                 
                
                 
                  
                 
                 
                 
                   return boxes, box_scores 
                  
                 
                
                 
                  
                 
                 
                  
                  
                 
                
                 
                  
                 
                 
                 
                   yolo_head()在model.py的第122行


      
      
      
               
                
                 
                  
                 
                 
                 
                   def yolo_head(feats, anchors, num_classes, input_shape, calc_loss=False): #參數同上 
                  
                 
                
                 
                  
                 
                 
                  
                  
                 
                
                 
                  
                 
                 
                 
                   """Convert final layer features to bounding box parameters.""" 
                  
                 
                
                 
                  
                 
                 
                  
                  
                 
                
                 
                  
                 
                 
                 
                   num_anchors = len(anchors) #num_anchors = 3 
                  
                 
                
                 
                  
                 
                 
                  
                  
                 
                
                 
                  
                 
                 
                 
                   # Reshape to batch, height, width, num_anchors, box_params. 
                  
                 
                
                 
                  
                 
                 
                 
                   anchors_tensor = K.reshape(K.constant(anchors), [1, 1, 1, num_anchors, 2]) #reshape ->(1,1,1,3,2) 
                  
                 
                
                 
                  
                 
                 
                  
                  
                 
                
                 
                  
                 
                 
                 
                   grid_shape = K.shape(feats)[1:3] # height, width (?,13,13,255) -> (13,13) 
                  
                 
                
                 
                  
                 
                 
                  
                  
                 
                
                 
                  
                 
                 
                  
                  
                 
                
                 
                  
                 
                 
                 
                   #grid_y和grid_x用於生成網格grid，通過arange、reshape、tile的組合， 創建y軸的0~12的組合grid_y，再創建x軸的0~12的組合grid_x，將兩者拼接concatenate，就是grid； 
                  
                 
                
                 
                  
                 
                 
                 
                   grid_y = K.tile(K.reshape(K.arange(0, stop=grid_shape[0]), [-1, 1, 1, 1]), 
                  
                 
                
                 
                  
                 
                 
                 
                   [1, grid_shape[1], 1, 1]) 
                  
                 
                
                 
                  
                 
                 
                 
                   grid_x = K.tile(K.reshape(K.arange(0, stop=grid_shape[1]), [1, -1, 1, 1]), 
                  
                 
                
                 
                  
                 
                 
                 
                   [grid_shape[0], 1, 1, 1]) 
                  
                 
                
                 
                  
                 
                 
                 
                   grid = K.concatenate([grid_x, grid_y]) 
                  
                 
                
                 
                  
                 
                 
                 
                   grid = K.cast(grid, K.dtype(feats)) #K.cast():把grid中值的類型變為和feats中值的類型一樣 
                  
                 
                
                 
                  
                 
                 
                  
                  
                 
                
                 
                  
                 
                 
                 
                   feats = K.reshape( 
                  
                 
                
                 
                  
                 
                 
                 
                   feats, [-1, grid_shape[0], grid_shape[1], num_anchors, num_classes + 5]) 
                  
                 
                
                 
                  
                 
                 
                 
                   #將feats的最后一維展開，將anchors與其他數據（類別數+4個框值+框置信度）分離 
                  
                 
                
                 
                  
                 
                 
                  
                  
                 
                
                 
                  
                 
                 
                 
                   # Adjust preditions to each spatial grid point and anchor size. 
                  
                 
                
                 
                  
                 
                 
                 
                   #xywh的計算公式，tx、ty、tw和th是feats值，而bx、by、bw和bh是輸出值，如下圖 
                  
                 
                
                 
                  
                 
                 
                 
                   box_xy = (K.sigmoid(feats[..., :2]) + grid) / K.cast(grid_shape[::-1], K.dtype(feats)) 
                  
                 
                
                 
                  
                 
                 
                 
                   box_wh = K.exp(feats[..., 2:4]) * anchors_tensor / K.cast(input_shape[::-1], K.dtype(feats)) 
                  
                 
                
                 
                  
                 
                 
                 
                   box_confidence = K.sigmoid(feats[..., 4:5]) 
                  
                 
                
                 
                  
                 
                 
                 
                   box_class_probs = K.sigmoid(feats[..., 5:]) 
                  
                 
                
                 
                  
                 
                 
                 
                   #sigmoid:σ 
                  
                 
                
                 
                  
                 
                 
                 
                     # ...操作符，在Python中，“...”(ellipsis)操作符，表示其他維度不變，只操作最前或最后1維；


      
      
      
               
                
                 
                  
                 
                 
                 
                   if calc_loss == True: 
                  
                 
                
                 
                  
                 
                 
                 
                   return grid, feats, box_xy, box_wh 
                  
                 
                
                 
                  
                 
                 
                 
                   return box_xy, box_wh, box_confidence, box_class_probs 
                  
                 
                
                 
                  
                 
                 
                  
                  
                 
                
                 
                  
                 
                 
                  
                  
                 
                
                 
                  
                 
                 
                 
                   yolo_correct_boxes()在model.py的第150行


      
      
      
               
                
                 
                  
                 
                 
                 
                   def yolo_correct_boxes(box_xy, box_wh, input_shape, image_shape): #得到正確的x,y,w,h 
                  
                 
                
                 
                  
                 
                 
                 
                   '''Get corrected boxes''' 
                  
                 
                
                 
                  
                 
                 
                 
                   box_yx = box_xy[..., ::-1] #“::-1”是顛倒數組的值 
                  
                 
                
                 
                  
                 
                 
                 
                   box_hw = box_wh[..., ::-1] 
                  
                 
                
                 
                  
                 
                 
                 
                   input_shape = K.cast(input_shape, K.dtype(box_yx)) 
                  
                 
                
                 
                  
                 
                 
                 
                   image_shape = K.cast(image_shape, K.dtype(box_yx)) 
                  
                 
                
                 
                  
                 
                 
                 
                   new_shape = K.round(image_shape * K.min(input_shape/image_shape)) 
                  
                 
                
                 
                  
                 
                 
                 
                   offset = (input_shape-new_shape)/2./input_shape 
                  
                 
                
                 
                  
                 
                 
                 
                   scale = input_shape/new_shape 
                  
                 
                
                 
                  
                 
                 
                 
                   box_yx = (box_yx - offset) * scale 
                  
                 
                
                 
                  
                 
                 
                 
                   box_hw *= scale 
                  
                 
                
                 
                  
                 
                 
                  
                  
                 
                
                 
                  
                 
                 
                 
                   box_mins = box_yx - (box_hw / 2.) 
                  
                 
                
                 
                  
                 
                 
                 
                   box_maxes = box_yx + (box_hw / 2.) 
                  
                 
                
                 
                  
                 
                 
                 
                   boxes = K.concatenate([ 
                  
                 
                
                 
                  
                 
                 
                 
                   box_mins[..., 0:1], #y_min 
                  
                 
                
                 
                  
                 
                 
                 
                   box_mins[..., 1:2], #x_min 
                  
                 
                
                 
                  
                 
                 
                 
                   box_maxes[..., 0:1], #y_max 
                  
                 
                
                 
                  
                 
                 
                 
                   box_maxes[..., 1:2] #x_max 
                  
                 
                
                 
                  
                 
                 
                 
                   ]) 
                  
                 
                
                 
                  
                 
                 
                  
                  
                 
                
                 
                  
                 
                 
                 
                   # Scale boxes back to original image shape. 
                  
                 
                
                 
                  
                 
                 
                 
                   boxes *= K.concatenate([image_shape, image_shape]) 
                  
                 
                
                 
                  
                 
                 
                 
                   return boxes

　OK, that's all! Enjoy it!

reference:

Https://blog.csdn.net/qq_14845119/article/details/80335225

https://www.cnblogs.com/makefile/p/YOLOv3.html

Https://www.colabug.com/4125223.html

Yolov_3 網絡結構分析

轉自：https://blog.csdn.net/KKKSQJ/article/details/83587138

Based on keras-yolov3, understanding of the principle and code details

免責聲明！