Shangxuan Tian——【ICCV2017】WeText_Scene Text Detection under Weak Supervision

作者和相關鏈接

作者

論文下載

文章亮點

用半監督和無監督來學習字符分類器，解決字符標注數據量少的問題
用regression的思路來學習字符分類器，而且是把proposal + text/non-text classification整合在一個網絡中學習（這一點沒有第一點亮）

方法介紹

檢測流程
- 用SSD檢測字符（文章的亮點在於如何訓練這個SSD）
- 用TextFlow的圖模型把字符連成單詞輸出

Figure 2: The framework of the proposed WeText system: A “light” supervised model is pre-trained using a small amount of annotated character image set. The light model is then applied to an unannotated dataset to search for more character samples which are combined with the small annotated dataset to train a semi-supervised model. Under certain weak annotations, better character samples can be searched to train a semi-supervised model

訓練SSD的半監督方法
- 用一個小數據集（記為D）采用監督的方式訓練一個light的base model（記為M）
- 用M跑一遍沒有標注的大數據集（記為R），將其中分數大於閾值（0.5）的樣本作為正樣本（記為數據集P）
- 用數據集D+數據集P訓練新的model（記為M’）
訓練SSD的弱監督方法
- 用一個小數據集（記為D）采用監督的方式訓練一個light的base model（記為M）
- 用M跑一遍有單詞標注信息的大數據集（記為R’），將其中分數大於閾值（0.2）且與單詞標注GT有重疊（水平和豎直IOU閾值0.8）的樣本作為正樣本（記為數據集P’）
- 用數據集D+數據集P’訓練新的model（記為M’’）

方法細節

幾種SSD模型效果對比

Figure 4: Comparison of different character detectors. Images in the top row from left to right are the input image and output of the baseline detector. Images in the bottom row from left to right are outputs of “COCO-Text Semi” and “COCO-Text Weakly” detectors, respectively. The thickness of the box boundary lines indicates the detection confidence

Training
- Base model：ICDAR2013的字符集
- FORU： FORU_Semi為半監督，FORU_Weakly為弱監督，FORU_GT為完全監督（FORU本身有字符集標注信息，COCO-Text上沒有，故沒有COCO-Text_GT）。FORU_GT的目的在於驗證用半監督和弱監督的方法也可以達到幾乎和完全監督的效果是一樣的（FORU_GT算是算法的精度上限），證明其半監督和弱監督的有效性；
- COCO_Text：由於COCO-Text的樣本集比FORU大，所以實驗證明了無監督數據越多，效果越好；

實驗結果

速度說明
- Nvidia Titan X GPU
- ICDAR2013：190ms-SSD模型，130ms-text line model，總的320ms/每張圖
ICDAR13

總結與收獲

這篇文章最大亮點無疑是用弱監督來擴增訓練數據的思想，非常有參考價值，所以雖然點少但是也中ICCV。但是文中沒有太多訓練細節，比如在新的數據庫上是重新train還是在原base model上fine-tune的，以及SSD的anchor設置細節之類的。
不看亮點，單看檢測方法，其缺點在於：第一，速度比較慢；第二，只能處理水平的，無法處理多方向的；第三，由於采用了character-based的pipeline，導致必須加上text flow里的圖模型來合並文本線。這種思路不但需要兩個分離的模型，速度降低，也會因為分步累計誤差，且無法端到端訓練。且第二點也是因為采用這種pipeline導致的，實際上要將character合並成多方向的text line也是可以的，但是不能用text flow里的，而是需要設計新的算法來替換（這個也蠻有難度的）。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 論文閱讀（Lukas Neumann——【ICCV2017】Deep TextSpotter_An End-to-End Trainable Scene Text Localization and Recognition Framework）論文閱讀筆記四十四：RetinaNet:Focal Loss for Dense Object Detection(ICCV2017）公式推導【BACF//ICCV2017】論文速讀（Chuhui Xue——【arxiv2019】MSR_Multi-Scale Shape Regression for Scene Text Detection）【論文速讀】Chuhui Xue_ECCV2018_Accurate Scene Text Detection through Border Semantics Awareness and Bootstrapping 【論文速讀】XiangBai_CVPR2018_Rotation-Sensitive Regression for Oriented Scene Text Detection 【論文速讀】Pan He_ICCV2017_Single Shot Text Detector With Regional Attention 【論文速讀】Cong_Yao_CVPR2017_EAST_An_Efficient_and_Accurate_Scene_Text_Detector 場景文本檢測(Scene text detection) -- CTPN 論文閱讀筆記三：R2CNN：Rotational Region CNN for Orientation Robust Scene Text Detection(CVPR2017)

【速讀】——Shangxuan Tian——【ICCV2017】WeText_Scene Text Detection under Weak Supervision