[Arxiv1706] Few-Example Object Detection with Model Communication 論文筆記


https://arxiv.org/pdf/1706.08249.pdf

Few-Example Object Detection with Model Communication,Xuanyi Dong, Liang Zheng, Fan Ma, Yi Yang, Deyu Meng

 

亮點

  • 本文僅僅通過每個類別3-4個bounding box標注即可實現物體檢測,並與其它使用大量training examples的方法性能可比
  • 主要方法是:multi-modal learning (多模型同時訓練) + self-paced learning (curriculum learning) 

相關工作

這里介紹幾個比較容易混淆的概念,以及與他們相關的方法

  • 弱監督物體檢測:數據集的標簽是不可靠的,如(x,y),y對於x的標記是不可靠的。這里的不可靠可以是標記不正確,多種標記,標記不充分,局部標記等。
    • 標簽是圖像級別的類別標簽[7][8][9][10][11][18][30][31][32][33][34]
  • 半監督物體檢測:半監督學習使用大量的未標記數據,以及同時使用標記數據,來進行模式識別工作。
    • 一些訓練樣本只有類別標簽,另外一些樣本有詳細的物體框和類別標注[4][5][6]
      • 需要大量標注 (e.g., 50% of the full annotations)
    • 每個類別只有幾個物體框標注(Few-Example Object Detection with Model Communication)[12][35]
      • 和few-shot learning 的區別:是否使用未標注數據學習
    • 通過視頻挖掘位置標注,此類方法主要針對會移動的物體[2][3][29][1]
  • Webly supervised learning for object detection: reduce the annotation cost by leveraging web data

方法

 

 

Basic detector: Faster RCNN & RFCN

Object proposal method: selective search & edge boxes

Annotations: when we randomly annotate approximately four images for each class, an image may contain several objects, and we annotate all the object bounding boxes.

 

參數更新
更新vj:對上述損失函數進行求導,可以得到vj的解

對同一張圖像i同一個模型j,如果有多個樣本使得vj=1,則只選擇使Lc最小的那個樣本置為1,其他置為0。gamma促使模型之間共享信息,因為vj為1時,閾值變大,圖像更容易被選擇到。

更新wj:與其它文章方法相同

更新yuj:為更新yuj我們需要從一組bounding box找到滿足以下條件的解,

很難直接找到最優化的解。文中采用的方案是:將所有模型預測出的結果輸入nms,並通過閾值只保留分數高的結果,余下的組成yuj。

去除難例:we employ a modified NMS (intersection/max(area1,area2)) to filter out the nested boxes, which usually occurs when there are multiple overlapping objects. If there are too many boxes (≥ 4) for one specific class or too many classes (≥ 4) in the image, this image will be removed. Images in which no reliable pseudo objects are found are filtered out.

實驗

Compared with the-state-of-the-art (4.2 images per class is annotated)

  • VOC 2007: -1.1mAP, correct localization +0.9% compared with [21]
  • VOC 2012: -2.5mAP compared with [21], correct localization +9.8%
  • ILSVRC 2013: -2.4mAP compared with [21]
  • COCO 2014: +1.3 mAP compared with [22]

[20] V. Kantorov, M. Oquab, M. Cho, and I. Laptev, “Contextlocnet: Context-aware deep network models for weakly supervised localization,” in European Conference on Computer Vision, 2016.
[21] A. Diba, V. Sharma, A. Pazandeh, H. Pirsiavash, and L. Van Gool, “Weakly supervised cascaded convolutional networks,” 2017
[22] Y. Zhu, Y. Zhou, Q. Ye, Q. Qiu, and J. Jiao, “Soft proposal networks for weakly supervised object localization,” in International Conference on Computer Vision, 2017.

Ablation study

  • VOC 2007: +4.1 mAP compared with model ensemble
  • k number of labeled images per class; w/ image labels: image-level supervision incorporated

  

 

不足

雖然localization有一定准確率,但是難例圖片漏檢比較多(也就是說few example classification效果不好)。

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM