論文速讀(Yongchao Xu——【2018】TextField_Learning A Deep Direction Field for Irregular Scene Text)


Yongchao Xu——【2018】TextField_Learning A Deep Direction Field for Irregular Scene Text Detection

論文

Yongchao Xu——【2018】TextField_Learning A Deep Direction Field for Irregular Scene Text Detection

作者

亮點
  1. 提出的TextField方法非常新穎,用點到最近boundary點的向量來區分不同instance
方法概述

針對曲文檢測,采用Instance-segmentation思路,提出一種對於分割點的新的表示方法TextField,旨在解決text instance的黏連問題。

TextField是一個二維的向量v,用來表示分割score map上的每一個點,它的含義是:每個text像素點到離自己最近的boundary點的向量。它的屬性包括:

  • 非text像素點=(0, 0),text像素點 $\ne$ (0,0)
  • 向量的magnitude,可以用來區分是文字/非文字像素點
  • 向量的direction,可以用來進行后處理幫助形成文本塊

具體檢測流程是:用一個VGG+FPN網絡學習TextField的兩張score map圖,然后這兩張圖上做關於超像素、合並、形態學等后處理來得到text instance。

Fig. 3: Pipeline of the proposed method. Given an image, the network learns a novel direction field in terms of a two-channel map, which can be regarded as an image of two-dimensional vectors. To better show the predicted direction field, we calculate and visualize its magnitude and direction information. Text instances are then obtained based on these information via the proposed post-processing using some morphological tools.

方法細節
  • Direction Field示例圖

Fig. 1: Different text representations. Classical relatively simple text representations in (a-c) fail to accurately delimit irregular texts. The text instances in (e) stick together using binary text mask representation in (d), requiring heavy postprocessing to extract text instances. The proposed direction field in (f) is able to precisely describe irregular text instances.

  • 網絡結構

    VGG16+FPN

Fig. 5: Network architecture. We adopt the pre-trained VGG16 [52] as the backbone network and multi-level feature fusion to capture multi-scale text instances. The network is trained to predict dense per-pixel direction field

  • TextField向量定義

For each pixel p inside a text instance T , let Np be the nearest pixel to p lying outside the text instance T , we then define a two-dimensional unit vector Vgt(p) that points away from Np to the underlying text pixel p. This unit vector Vgt(p) directly encodes approximately relative location of p inside T and highlights the boundary between adjacent text instances.

where |NpP| denotes length of the vector starting from pixel Np to p, and T stands for all the text instances in an image. In practice, for each text pixel p, it is simple to compute its nearest pixel Np outside the text instance containing p by distance transform algorithm.


Fig. 4: Illustration of the proposed direction field. Given an image and its text annotation, a binary text mask can be easily generated. For each text pixel p, we find its nearest non-text pixel Np. Then, a two-dimensional unit vector that points away from N p to p is defined as the direction field on p. For non-text pixels, the direction field is set to (0;0). On the right, we visualize the direction information of the text direction field.

  • 損失函數

    歐式距離+帶權(按text instance的面積)


  • 后處理流程

Fig. 6: Illustration of the proposed post-processing. (a): Directions on candidate text pixels; (b): Text superpixels (in different color) and their representatives (in white); (c): Dilated and grouped representatives of text superpixels; (d): Labels of filtered representatives; (e): Candidate text instances; (f) Final segmented text instances.

實驗結果
  • SCUT-CTW1500

  • Total-Text

  • ICDAR2015

  • MSRA-TD500
收獲點與問題
  1. 沒有說清楚的點:怎么算最近boundary點距離,還有后處理的那么多方法都沒辦法說清
  2. 方法非常新穎,但是,后處理太復雜了,速度上就占了1/4,向量表示方法也不太直觀,不是特別通用的方法。


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM