mobilenetv3 ssd檢測實戰


1、Mobilenetv3(Searching for MobileNetV3)

    論文發出於2019年5月,作者google。

    論文鏈接:https://arxiv.org/pdf/1905.02244.pdf

    代碼:https://paperswithcode.com/paper/searching-for-mobilenetv3

    摘要:

    We present the next generation of MobileNets based on a combination of complementary search techniques as well as a novel architecture design. MobileNetV3 is tuned to mobile phone CPUs through a combination of hardwareaware network architecture search (NAS) complemented by the NetAdapt algorithm and then subsequently improved through novel architecture advances. This paper starts the exploration of how automated search algorithms and network design can work together to harness complementary approaches improving the overall state of the art. Through this process we create two new MobileNet models for release: MobileNetV3-Large and MobileNetV3-Small which are targeted for high and low resource use cases. These models are then adapted and applied to the tasks of object detection and semantic segmentation. For the task of semantic segmentation (or any dense pixel prediction), we propose a new efficient segmentation decoder Lite Reduced Atrous Spatial Pyramid Pooling (LR-ASPP). We achieve new state of the art results for mobile classification, detection and segmentation. MobileNetV3-Large is 3.2% more accurate on ImageNet classification while reducing latency by 20% compared to MobileNetV2. MobileNetV3-Small is 6.6% more accurate compared to a MobileNetV2 model with comparable latency. MobileNetV3-Large detection is over 25% faster at roughly the same accuracy as MobileNetV2 on COCO detection. MobileNetV3-Large LRASPP is 34% faster than MobileNetV2 R-ASPP at similar accuracy for Cityscapes segmentation.

      注:如果看到太長的英文,一遍讀不懂,可以分析一下句子的組成結構,較快的摘取出要表達的意思。

      理解全文參考:https://cloud.tencent.com/developer/article/1467101

 

2、ssd(Single Shot MultiBox Detector)

    論文發出於2015年12月,作者

    論文鏈接:https://arxiv.org/pdf/1512.02325.pdf

    代碼:https://github.com/weiliu89/caffe/tree/ssd

    摘要:

    We present a method for detecting objects in images using a single deep neural network. Our approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location. At prediction time, the network generates scores for the presence of each object category in each default box and produces adjustments to the box to better match the object shape. Additionally, the network combines predictions from multiple feature maps with different resolutions to naturally handle objects of various sizes. Our SSD model is simple relative to methods that require object proposals because it completely eliminates proposal generation and subsequent pixel or feature resampling stage and encapsulates all computation in a single network. This makes SSD easy to train and straightforward to integrate into systems that require a detection component. Experimental results on the PASCAL VOC, MS COCO, and ILSVRC datasets confirm that SSD has comparable accuracy to methods that utilize an additional object proposal step and is much faster, while providing a unified framework for both training and inference. Compared to other single stage methods, SSD has much better accuracy, even with a smaller input image size. For 300×300 input, SSD achieves 72.1% mAP on VOC2007 test at 58 FPS on a Nvidia Titan X and for 500×500 input, SSD achieves 75.1% mAP, outperforming a comparable state of the art Faster R-CNN model.

 

3、mobilenetv3-ssd代碼分析及實驗

    代碼鏈接:https://github.com/shaoshengsong/MobileNetV3-SSD 和  https://github.com/tongyuhome/MobileNetV3-SSD

    兩者結合的原理:將MobilenetV3抽出二層特征層+另外補充的4層卷積層輸出的特征,進行分類和回歸。

    實踐需要注意的理解:代碼使用 source_layer_indexes = [GraphPath(11, 'conv', -1),

                                                                                  # 11,

                                                                                   20] 是mobilenetv3-small中使用特征層的信息,

                       

                                              source_layer_indexes = [GraphPath(16, 'conv', -1),

                                                                                 # 11,

                                                                                 22] 是mobilenetv3-large中使用特征層的信息。

       實踐改進:此時ssd特征層分支的第一層特征圖size是19*19,如果要將ssd的特征層輸出大小變為38*38,19*19的話,以mobilenetv3-large為例,設置為:

                      

                      source_layer_indexes = [GraphPath(10, 'conv', -1), GraphPath(16, 'conv', -1),

                                                         # 11,

                                                         22] ,然后需要修改classifficition和regression部分的通道數。

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM