Chuhui Xue——【arxiv2019】MSR_Multi-Scale Shape Regression for Scene Text Detection

論文

作者

Chuhui Xue, Shijian Lu, Wei Zhang

亮點

multi-scale網絡中利用FPN的up-sampling把多個不同scale得到的結果進行融合（concat + uppooling）
boundary-point regression部分直接預測點與最近的boundary point的dx和dy，思路清晰且易實現

方法概述

針對任意文字檢測（水平、傾斜、曲文），通過網絡來regress文字的邊界像素點來得到text region。

整個檢測的流程包括：

特征提取：通過一個類似於Image Pyramid的多通道多尺度網絡來提取不同scale的圖像特征（FPN框架）
目標預測：預測包括三個分支
- text region的classification分支
- 與nearest boundary point之間的x的dis
- 與nearest boundary point之間的y的dis
結果輸出：利用Alpha-Shape Algorithm從boundary point set中的得到外邊界凸多邊形

Fig. 1: Scene text detection using the proposed multi-scale shape regression network (MSR): For scene texts with arbitrary orientations and shapes in (a), MSR first predicts dense text boundary points (in red color) as shown in (b) and then locates texts by a polygon (in green color) that encloses all boundary points of each text instance as shown in (c).

方法細節

Multi-scale Network

Fig. 3: Structure of proposed multi-scale network (for two-scale case): Features extracted from layers Conv2 - Conv5 of two network channels are fused, where features of the same scale are fused by a Concat UpConv as illustrated and features from the deepest layer of the lower-scale channel are up-sampled to the scale of the previous layer for fusion.

Alpha-Shape Algorithm
- 參考文獻：N. Akkiraju, H. Edelsbrunner, M. Facello, P. Fu, E. Mucke, and C. Varela, “Alpha shapes: definition and software,” in Proceedings of the 1st International Computational Geometry Software Workshop, vol. 63, 1995, p. 66.
groundTruth生成
- 用Triangle算法將多邊形轉為多個三角形
- 取三角形兩側邊的1/4點處，把下圖b中的綠色點依次連接起來，得到一個shrink的text region（下圖c中的藍色區域）
- 求text region中每個點的最近的boundary point，並計算與該boundary point的x的offset，y的offset，得到兩個distance_x_map（e）和distance_y_map（f）

Fig. 4: Illustration of ground-truth generation: Given a text annotation polygon in (a), triangulation is performed over the polygon vertices to locate the vertices (green points in (b)) of the central text region in blue color in (c). For each centraltext-region pixel tp (in blue color in (d)), the nearest point on the text annotation box b p in yellow color is determined as the nearest text boundary point as shown in (d), and the distance between t p and bp is used to generate ground-truth distance maps as shown in (e) and (f)

損失函數
- 點分類（Dice coefficient）
- 最近boundary point的dx、dy回歸（Smooth_L1）

總的

實驗結果

ICDAR13
MSRA-TD500

CTW1500
Total-Text

Ablation experiments on CTW1500
- Baseline-EAST

疑問問題

存在部分regress錯誤的outlier點，怎么消除？
最后的prediction只利用了class_score_map(score > threshold)的點 + dx、dy，得到的regression boundary point map來算凸多邊形，沒有利用class_score_map圖本身信息？（結合這個是不是效果會更好？）
三角化用的是什么算法？

收獲點與問題

用embedding來學習字符間的關系還是比較新的一個出發點。整個方法還是傳統方法字底向上的思路，多步驟而且速度應該比較慢。整體感覺偏engineering，實驗上標明也是一些比較工程上的trick對實驗結果提升較明顯

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 【論文速讀】Chuhui Xue_ECCV2018_Accurate Scene Text Detection through Border Semantics Awareness and Bootstrapping 【論文速讀】XiangBai_CVPR2018_Rotation-Sensitive Regression for Oriented Scene Text Detection Scene Text Detection(場景文本檢測)論文思路總結【速讀】——Shangxuan Tian——【ICCV2017】WeText_Scene Text Detection under Weak Supervision XiangBai——【CVPR2018】Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation 【論文速讀】Cong_Yao_CVPR2017_EAST_An_Efficient_and_Accurate_Scene_Text_Detector 【論文速讀】XiangBai_TIP2018_TextBoxes++_A Single-Shot Oriented Scene Text Detector 論文速讀（Yongchao Xu——【2018】TextField_Learning A Deep Direction Field for Irregular Scene Text）論文閱讀（Weilin Huang——【TIP2016】Text-Attentional Convolutional Neural Network for Scene Text Detection）【自然場景文本檢測】PSENet: Shape Robust Text Detection with Progressive Scale Expansion Network