[Localization] MobileNet with SSD


先來一波各版本性能展覽:

Pre-trained Models

Choose the right MobileNet model to fit your latency and size budget. The size of the network in memory and on disk is proportional to the number of parameters. The latency and power usage of the network scales with the number of Multiply-Accumulates (MACs) which measures the number of fused Multiplication and Addition operations. These MobileNet models have been trained on the ILSVRC-2012-CLS image classification dataset. Accuracies were computed by evaluating using a single image crop.

Model Checkpoint Million MACs Million Parameters Top-1 Accuracy Top-5 Accuracy
MobileNet_v1_1.0_224 569 4.24 70.7 89.5
MobileNet_v1_1.0_192 418 4.24 69.3 88.9
MobileNet_v1_1.0_160 291 4.24 67.2 87.5
MobileNet_v1_1.0_128 186 4.24 64.1 85.3
MobileNet_v1_0.75_224 317 2.59 68.4 88.2
MobileNet_v1_0.75_192 233 2.59 67.4 87.3
MobileNet_v1_0.75_160 162 2.59 65.2 86.1
MobileNet_v1_0.75_128 104 2.59 61.8 83.6
MobileNet_v1_0.50_224 150 1.34 64.0 85.4
MobileNet_v1_0.50_192 110 1.34 62.1 84.0
MobileNet_v1_0.50_160 77 1.34 59.9 82.5
MobileNet_v1_0.50_128 49 1.34 56.2 79.6
MobileNet_v1_0.25_224 41 0.47 50.6 75.0
MobileNet_v1_0.25_192 34 0.47 49.0 73.6
MobileNet_v1_0.25_160 21 0.47 46.0 70.7
MobileNet_v1_0.25_128 14 0.47 41.3 66.2

 前兩個大小還是可以接受的。

 

Here is an example of how to download the MobileNet_v1_1.0_224 checkpoint:

$ CHECKPOINT_DIR=/tmp/checkpoints $ mkdir ${CHECKPOINT_DIR} $ wget http://download.tensorflow.org/models/mobilenet_v1_1.0_224_2017_06_14.tar.gz
$ tar -xvf mobilenet_v1_1.0_224_2017_06_14.tar.gz $ mv mobilenet_v1_1.0_224.ckpt.* ${CHECKPOINT_DIR} $ rm mobilenet_v1_1.0_224_2017_06_14.tar.gz

 

代碼於此,未來需研究一波。

models/research/slim/nets/mobilenet_v1.py

 


 

傳統卷積-->分離成每個通道的單濾波器卷積 then 與每個pixel的1*1卷積做合並。可以提高至少十倍性能!

Ref: http://www.jianshu.com/p/072faad13145

Ref: http://blog.csdn.net/jesse_mx/article/details/70766871

 

摘要

  • 使用深度可分解卷積(depthwise separable convolutions)來構建輕量級深度神經網絡的精簡結構(streamlined architecture,流線型結構 or 精簡結構,傾向於后者)。
  • 本文引入了兩個 簡單的全局超參來有效權衡延遲(latency)和准確度(accuracy)。這些超參允許模型構建者根據具體問題的限制為他們的應用選擇規模合適的模型。

1.引言

2.背景介紹

MobileNets主要側重於優化延遲,但也能夠產生小型網絡。很多關於小型網絡的論文只關注大小,卻不考慮速度問題。

記錄:需要注意的是目前在應用於移動和嵌入式設備的深度學習 網絡研究主要有兩個方向:
1)直接設計較小且高效的網絡,並訓練。
    • Inception思想,采用小卷積。
    • Network in Network 改進傳統CNN,使用1x1卷積,提出MLP CONV層Deep Fried Convents采用Adaptive Fast-food transform重新 參數化全連接層的向量矩陣。
    • SqueezeNet設計目標主要是為了簡化CNN的模型參數數量,主要采用了替換卷積核3x3為1x1,使用了deep compression技術對網絡進行了壓縮 。
    • Flattened networks針對快速前饋執行設計的扁平神經網絡。
2)對預訓練模型進行壓縮小、分解或者壓縮。
    • 采用hashing trick進行壓縮;采用huffman編碼;
    • 用大網絡來教小網絡;
    • 訓練低精度乘法器;
    • 采用二進制輸入二進制權值等等。

 

3.MobileNet 架構

  • 深度可分解濾波(depth wise separable filters)建立的MobileNets核心層;
  • 兩個模型收縮超參:寬度乘法器分辨率乘法器width multiplierresolution multiplier

3.1.深度可分解卷積(Depthwise Separable Convolution)

MobileNet模型機遇深度可分解卷積,其可以將標准卷積分解成 一個深度卷積和一個 1x1的點卷積

 

3.2. 網絡結構和訓練

對應的api:https://www.tensorflow.org/versions/r0.12/api_docs/python/nn/convolution

tf.nn.depthwise_conv2d(input, filter, strides, padding, name=None)

 


3.3.寬度乘法器(Width Multiplier)

第一個超參數,即寬度乘數 α 。

為了構建更小和更少計算量的網絡,引入了寬度乘數 α ,改變輸入輸出通道數,減少特征圖數量,讓網絡變瘦

在 α 參數作用下,MobileNets某一層的計算量為:

DK×DK×αM×DF×DαM×αN×DF×DF

其中, α 取值是0~1,應用寬度乘數可以進一步減少計算量,大約有 α2 的優化空間。


3.4. 分辨率乘法器(Resolution Multiplier)

第二個超參數是分辨率乘數 ρ 

用來改變輸入數據層的分辨率,同樣也能減少參數。

在 α 和 ρ 共同作用下,MobileNets某一層的計算量為:

DK×DK×αM×ρDF×ρDαM×αN×ρDF×ρDF

其中,ρ 是隱式參數,ρ 如果為{1,6/7,5/7,4/7},則對應輸入分辨率為{224,192,160,128},ρ 參數的優化空間同樣是 ρ2 左右。

 

 

以上就是重點。

 


 

 

Final Report: Towards Real-time Detection and Camera Triggering

在草莓pi上各個輕網絡的的實驗效果,其中有介紹剪裁網絡的思路,提供了很多線索,挺好。

以及對mobileNet結構的微調方法。 

 

The main thing that makes MobileNets stand out is its use of depthwise separable convolution (DSC) layer.

The Intuition behind DSC: studies [2] have showed that DSC can be treated as an extreme case of inception module.


The structure of Mobile-Det is similar to ssd-vgg-300[Localization] SSD - Single Shot MultiBoxDetector the original SSD framework.

  • The difference is that rather than using VGG, now the backbone is MobileNets,
  • and also all the following added convolution is replaced with depthwise separable convolution.

 

The benefit of using SSD framework is evident:

now we have a unified model and is able to train end-to-end;

we do not rely on the reference frame and hence the temporal information, expanding our application scenarios;

it is also more accurate in theory.

However, the main issue is that, the model becomes very slow, as a large amount of convolution operations are added.

 

In this project, we tested a variety of detection models, including the state-of-art YOLO2, and two our newly proposed models: Temporal Detection and Mobile-Det.
We conclude that the current object detection methods, although accurate, is far from being able to be deployed in real-world applications due to large model size and slow speed.

Our work of Mobile-Det shows that the combination of SSD and MobileNet provides a new feasible and promising insight on seeking a faster detection framework.

Finally, we present the power of temporal information and shows differential based region proposal can drastically increase the detection speed.

 

7. Future work
There are a few aspects that could potentially improve the performance but remains to be implemented due to limited time, including:
• Implement an efficient inference module of 8bit float in C++ to better take advantage of the speed up of quantization to inference step on small device.
• Try to combine designed CNN modules like MobileNet module and fire module with other real-time standard detection frameworks.

【fire layer貌似不好使,有實驗已測試】

 

今后課題:

  • 如何訓練識別兩個以上的物體。
  • 使用傳統算子生成訓練集。

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM