Yuliang Liu_2017_Detecting Curve Text in the Wild_New Dataset and New Solution

作者和代碼

caffe版代碼

關鍵詞

文字檢測、曲文、直接回歸、14個點、one-stage、開源

方法亮點

第一篇做曲文檢測，還提出一個數據集CTW1500
使用14個點多邊形來表示曲文
提出了一個結合CNN-RPN+RNN的檢測方法專門做曲文檢測

方法概述

本文方法基於RPN進行修改，除了學習text/non-text分類，多邊形的bounding box回歸（x1,y1,x2,y2），增加了14個點的回歸，最后再進行后處理（去噪+nms）得到最終輸出。

方法細節

用多邊形比用四邊形表示曲文更好

網絡結構

分三個分支。

第一個text/non-text分支，普通的分類任務
第二個分支是整個曲文（多邊形）的最外接正矩形bounding box的x1，y1，x2，y2回歸任務
第三個分支是14個點的點坐標的回歸任務。包括采用類似R-FCN方式進行畫網格pooling、以及用RNN來增加上下文信息做平滑

regression輸出

使用32個值 = 14*2=28個坐標偏移量 + 多邊形的boundingbox的4個值（x1, y1, x2, y2）

Recurrent Transverse and Longitudinal Offset Connection (TLOC)

PSROIPooling： Position-sensitive ROI Pooling，類似於R-FCN，因為14個點分布的位置不同，故采用這種和位置相關的pooling
把x、y分開成兩個branch
總的loss = 二類分類 + bounding box 回歸 + 坐標點回歸

使用RNN來平滑點（點坐標可以看做是序列問題，上下文相關性強，例如第4個點必須在第2個點的右邊，有隱性約束條件，故可以用RNN來平滑）

Independently predicting each offset may lead to unsmooth text region, and somehow it may bring more false detection. Therefore, we assume the width/height of each point has associated context information, and using RNN to learn their latent characteristics. We name this method as recurrent transverse and longitudinal offset connection (TLOC).

是否使用TLOC的效果對比

CTW1500數據集

圖像數1500，10751個bounding boxes，3530 curve bounding boxes，at least one curve text per image.
數據來源：google Open-Image 、自己手機收集樣本
標記點采用14個點

標記不同框的效率對比

CTW樣例圖

長邊插值

對於只有兩個或四個點的annotation采用均勻差值到14個點

Figure 6. Visualization of the interpolation for 4 points bounding boxes. The 10 equal division points will be respectively interpolated in two Red sides of each bounding box. Green means straight line without interpolation.

后處理NMS

去掉無效的多邊形（比如不能有相交的邊）
進行多邊形的NMS（計算多邊形的交並比）

實驗結果

CTW1500
TLOC和NMS結果

實驗結果示例

總結與收獲

CTD這篇華南理工的是第一個做曲文檢測的。以前有人做過曲文的不過主要是識別，這是第一次做檢測，而且還提出了一個專門做曲文的數據庫CTW1500。這個方法的核心在於統一用14個點來表示曲線文字，然后因為相鄰的點之間應該要有一定上下文的相關性（相鄰點不能跑太遠），所以作者用了RNN來做平滑。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 論文速讀（Jiaming Liu——【2019】Detecting Text in the Wild with Deep Character Embedding Network ）【論文速讀】Shangbang Long_ECCV2018_TextSnake_A Flexible Representation for Detecting Text of Arbitrary Shapes 論文閱讀（XiangBai——【CVPR2017】Detecting Oriented Text in Natural Images by Linking Segments）【論文速讀】Cong_Yao_CVPR2017_EAST_An_Efficient_and_Accurate_Scene_Text_Detector 【論文速讀】Pan He_ICCV2017_Single Shot Text Detector With Regional Attention 【論文閱讀】TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes 論文閱讀筆記四：CTPN: Detecting Text in Natural Image with Connectionist Text Proposal Network(ECCV2016) 論文閱讀（Weilin Huang——【ECCV2016】Detecting Text in Natural Image with Connectionist Text Proposal Network）深度學習論文翻譯解析（三）：Detecting Text in Natural Image with Connectionist Text Proposal Network 【論文速讀】XiangBai_TIP2018_TextBoxes++_A Single-Shot Oriented Scene Text Detector