Yongchao Xu——【2018】TextField_Learning A Deep Direction Field for Irregular Scene Text Detection

論文

作者

亮點

提出的TextField方法非常新穎，用點到最近boundary點的向量來區分不同instance

方法概述

針對曲文檢測，采用Instance-segmentation思路，提出一種對於分割點的新的表示方法TextField，旨在解決text instance的黏連問題。

TextField是一個二維的向量v，用來表示分割score map上的每一個點，它的含義是：每個text像素點到離自己最近的boundary點的向量。它的屬性包括：

非text像素點=（0, 0），text像素點 $\ne$ （0，0）
向量的magnitude，可以用來區分是文字/非文字像素點
向量的direction，可以用來進行后處理幫助形成文本塊

具體檢測流程是：用一個VGG+FPN網絡學習TextField的兩張score map圖，然后這兩張圖上做關於超像素、合並、形態學等后處理來得到text instance。

Fig. 3: Pipeline of the proposed method. Given an image, the network learns a novel direction field in terms of a two-channel map, which can be regarded as an image of two-dimensional vectors. To better show the predicted direction field, we calculate and visualize its magnitude and direction information. Text instances are then obtained based on these information via the proposed post-processing using some morphological tools.

方法細節

Direction Field示例圖

Fig. 1: Different text representations. Classical relatively simple text representations in (a-c) fail to accurately delimit irregular texts. The text instances in (e) stick together using binary text mask representation in (d), requiring heavy postprocessing to extract text instances. The proposed direction field in (f) is able to precisely describe irregular text instances.

網絡結構

VGG16+FPN

Fig. 5: Network architecture. We adopt the pre-trained VGG16 [52] as the backbone network and multi-level feature fusion to capture multi-scale text instances. The network is trained to predict dense per-pixel direction field

TextField向量定義

For each pixel p inside a text instance T , let Np be the nearest pixel to p lying outside the text instance T , we then define a two-dimensional unit vector Vgt(p) that points away from Np to the underlying text pixel p. This unit vector Vgt(p) directly encodes approximately relative location of p inside T and highlights the boundary between adjacent text instances.

where |NpP| denotes length of the vector starting from pixel Np to p, and T stands for all the text instances in an image. In practice, for each text pixel p, it is simple to compute its nearest pixel Np outside the text instance containing p by distance transform algorithm.

Fig. 4: Illustration of the proposed direction field. Given an image and its text annotation, a binary text mask can be easily generated. For each text pixel p, we find its nearest non-text pixel Np. Then, a two-dimensional unit vector that points away from N p to p is defined as the direction field on p. For non-text pixels, the direction field is set to (0;0). On the right, we visualize the direction information of the text direction field.

損失函數

歐式距離+帶權（按text instance的面積）

后處理流程

Fig. 6: Illustration of the proposed post-processing. (a): Directions on candidate text pixels; (b): Text superpixels (in different color) and their representatives (in white); (c): Dilated and grouped representatives of text superpixels; (d): Labels of filtered representatives; (e): Candidate text instances; (f) Final segmented text instances.

實驗結果

SCUT-CTW1500

Total-Text

ICDAR2015

MSRA-TD500

收獲點與問題

沒有說清楚的點：怎么算最近boundary點距離，還有后處理的那么多方法都沒辦法說清
方法非常新穎，但是，后處理太復雜了，速度上就占了1/4，向量表示方法也不太直觀，不是特別通用的方法。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 【論文速讀】XiangBai_CVPR2018_Rotation-Sensitive Regression for Oriented Scene Text Detection 【論文速讀】Chuhui Xue_ECCV2018_Accurate Scene Text Detection through Border Semantics Awareness and Bootstrapping 【論文速讀】XiangBai_TIP2018_TextBoxes++_A Single-Shot Oriented Scene Text Detector 論文閱讀筆記（六十五）【ECCV2018】：Deep Cross-Modal Projection Learning for Image-Text Matching 【論文速讀】Cong_Yao_CVPR2017_EAST_An_Efficient_and_Accurate_Scene_Text_Detector 論文速讀（Chuhui Xue——【arxiv2019】MSR_Multi-Scale Shape Regression for Scene Text Detection）【論文速讀】Shangbang Long_ECCV2018_TextSnake_A Flexible Representation for Detecting Text of Arbitrary Shapes 論文閱讀（Weilin Huang——【AAAI2016】Reading Scene Text in Deep Convolutional Sequences）論文速讀（Jiaming Liu——【2019】Detecting Text in the Wild with Deep Character Embedding Network ）論文閱讀（XiangBai——【PAMI2018】ASTER_An Attentional Scene Text Recognizer with Flexible Rectification )