yolo-tensorflow復現解析

本文轉載自查看原文 2018-10-01 00:14 2789

看到有人使用tensorflow復現了yoloV3，來此記錄下代碼閱讀。感覺復現的代碼寫的不是很好，會加一部分其他人用keras復現的代碼。

tensorflow代碼地址：https://blog.csdn.net/IronMastiff/article/details/79940118

源代碼分為以下幾部分：

Train.py為主程序train.py部分為訓練自己的數據集，eval.py為利用訓練好的權重來進行預測。Reader為讀取數據標簽等，config.yml為訓練過程中的一些參數設置，eval_config.yml為預測過程中的一些參數設置。Utils包為其中的一些網絡結構，IOU等中間步驟。下面先介紹utils的中的程序

Net.py: 設置網絡結構，提取圖片信息。Darknet-53網絡結構如下。1x,2x等分別表示該結構重復了1次兩次等，Residual表示和前面方框外的結構進行按維度疊加，類似於殘差網絡。

feature_extractor 函數該提取三個尺度的信息scale1,scale2,scale3，分別為倒數第1,2,3個方框中網絡結構的輸出。之后，scales函數通過1x1，3x3卷積分別對三個尺度的特征單元進行特征交互。返回交互后的三個尺度信息。
最后輸出的三個不同尺度分別為 13X13X75,26X26X75,52X52X75具體交互信息可參考代碼結構。最后訓練時會選取不同scale參與計算loss，select_things就是選取不同scale的特征。
不太理解如何使用？是需要手動更改配置文件的scale選擇？如何選擇？

IOU.py 為NMS篩選anchor，用來參與最后的loss計算get_loss.py.先來解釋IOU

 1 def IOU_calculator( x, y, width, height, l_x, l_y, l_width, l_height ):
 2     '''
 3     x,y,width,height分別為預測框的中心坐標及寬，高，l_x, l_y, l_width, l_height分別為真實框的中心坐標及寬，高
 4     '''
 5     ##x_min=x-w/2,y_min=y-h/2,x_max=x+w/2,y_max=y+h/2 此段意義為分別求出四個角坐標
 6     x_max = calculate_max( x , width / 2 )
 7     y_max = calculate_max( y, height / 2 )
 8     x_min = calculate_min( x, width / 2 )
 9     y_min = calculate_min( y, height / 2 )
10 
11     l_x_max = calculate_max( l_x, width / 2 )
12     l_y_max = calculate_max( l_y, height / 2 )
13     l_x_min = calculate_min( l_x, width / 2 )
14     l_y_min = calculate_min( l_y, height / 2 )
15 
16     '''求相交部分的面積'''
17     xend = tf.minimum( x_max, l_x_max )
18     xstart = tf.maximum( x_min, l_x_min )
19 
20     yend = tf.minimum( y_max, l_y_max )
21     ystart = tf.maximum( y_min, l_y_min )
22 
23     area_width = xend - xstart
24     area_height = yend - ystart
25 
26     '''IOU=A & B/(A+B-A & B)若A與B交集為0，則返回1e-8'''
27     area = area_width * area_height
28 
29     all_area = tf.cond( ( width * height + l_width * l_height - area ) <= 0, lambda : tf.cast( 1e-8, tf.float32 ), lambda : ( width * height + l_width * l_height - area ) )
30 
31     IOU = area / all_area
32 
33     IOU = tf.cond( area_width < 0, lambda : tf.cast( 1e-8, tf.float32 ), lambda : IOU )
34     IOU = tf.cond( area_height < 0, lambda : tf.cast( 1e-8, tf.float32 ), lambda : IOU )
35 
36     return IOU

get_loss.py為計算損失函數，他的損失函數計算是按照yolov1來計算的，有點問題。

 1 def objectness_loss( input, switch, l_switch, alpha = 0.5 ):
 2     '''
 3     input為IOU，switch為若預測該框內有object則為1，否則為0，l_switch為實際該框有object則為1，否則為0
 4     '''
 5 
 6     IOU_loss = tf.square( l_switch - input * switch )  ##input * switch類別置信度C
 7     loss_max = tf.square( l_switch * 0.5 - input * switch )
 8 
 9     IOU_loss = tf.cond( IOU_loss < loss_max, lambda : tf.cast( 1e-8, tf.float32 ), lambda : IOU_loss )
10 
11     IOU_loss = tf.cond( l_switch < 1, lambda : IOU_loss * alpha, lambda : IOU_loss )
12 
13     return IOU_loss
14 
15 def location_loss( x, y, width, height, l_x, l_y, l_width, l_height, alpha = 5 ):
16     point_loss = ( tf.square( l_x - x ) + tf.square( l_y - y ) ) * alpha
17     size_loss = ( tf.square( tf.sqrt( l_width ) - tf.sqrt( width ) ) + tf.square( tf.sqrt( l_height ) - tf.sqrt( height ) ) ) * alpha
18 
19     location_loss = point_loss + size_loss
20 
21     return location_loss
22 
23 def class_loss( inputs, labels ):
24     classloss = tf.square( labels - inputs )
25     loss_sum = tf.reduce_sum( classloss )
26 
27     return loss_sum

接下來是提取訓練數據的程序extract_labels.py 可下載pascal voc數據集，對照數據的格式來讀數據。比較麻煩，但是這個也是訓練程序與預測程序最大的不同點，這份代碼最大的亮點也在此，其他部分實現個人感覺並不是很好。數據既可以讀取類別標簽，也可讀取物體框的信息。

粗略的寫了一份程序解讀，因為只能找到一個tensorflow代碼實現，個人認為不是很好，希望有人有比較好的復現可以說一下。同時學好C++很重要呀，就可以直接讀取源碼了。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 YOLO v1的詳解與復現 yolo系列之yolo v3【深度解析】 yolo源碼解析（一） yolo源碼解析（二） yolo源碼解析（三） yolo源碼解析(1):代碼邏輯 Nginx 解析漏洞復現 Nginx 解析漏洞復現 Nginx 解析漏洞復現 nginx解析漏洞復現