OCR 綜述


OCR 發展趨勢

  • 場景文本檢測
  • 場景文字識別
  • 端到端場景文本識別

場景文字檢測

方法舉例:

  • 基於回歸的方法

    • Gupta et al, CVPR 2016; Tian et al, ECCV 2016;
    • Shi, Bai, et al, ICCV 2017; Liu et al, CVPR 2017;
    • Liao et al, AAAI 2017; Hu et al, ICCV 2017 ...
  • 基於分割的方法

    • Zhong et al, CVPR 2016; Zhou et al, CVPR 2017;
    • Wu et al, ICCV 2017; Dent et al, AAAI 2018;
    • X Li, CVPR 2019; W Wang, et al, CVPR 2019 ...
  • 混合方法 (分割+回歸)

    • He et al, ICCV 2017; Lyu et al, CVPR 2018;
    • Liao et al, CVPR 2018; Long et al, ECCV 2018;
    • Liu et al, IJCAI 2019 ...

發展趨勢:

水平矩形框檢測 \(\longrightarrow\) 多方向矩形框 \(\longrightarrow\) 多方向四邊形 \(\longrightarrow\) 曲線文本 \(\longrightarrow\) 任意形狀

注:

  • Segmentation based 的方法不容易准確區分相鄰或重疊文本
  • Regression based 的方法對長文本不易檢測完整
    • Bounding box regression 方法需要設置合理的 anchor 參數

Anchor & RPN 調參問題:

Anchor free 回歸方法舉例:

  • Segmentation based methods
  • C.He et al, Direct Regression..., ICCV 2017, TIP 2018.
  • Z Zhong et al, An Anchor-Free Region Proposal Network..., IJDAR 2019.
  • Zhi Tian, Chunhua Shen, et al, FCOS, CVPR, 2019.
  • Chenchen Zhu, Yihui He, et al, FSAF, CVPR, 2019.
  • Tao Kong, Fuchun Sun et al, FoveaBox, arXiv 2019.

Why anchor free?
大多數 RPN regression 方法需要設置合理的 anchors 參數
Eg: SSD \(\longrightarrow\) TextBox (AAAI 2017)

Alternative anchor design?
Lele Xie, Yuliang Liu, Lianwen Jin, Zecheng Xie, DeRPN: Taking a further step toward more general object detection, AAAI 2019.

場景文字識別

場景文字識別方法:

  • 基於 CTC 的方法

    • P.He et al, AAAI 2016 (DTRN: CNN+RNN+CTC)
    • B.shi et al, TPAMI 2017 (CRNN: CNN+RNN+CTC)
    • F Yin, et al, arXiv 2017 (CNN+CTC)
    • Y Wu, etal, arXiv 2018 (CNN+CTC)
    • Y Liu et al, ECCV 2018 (GAN+CTC)
  • 基於 attention 的方法

    • C Lee et al, CVPR 2016; B shi 二圖案例, CVPR 2016
    • X Yang et al, IJCAI 2017
    • Bai et al, CVPR 2018; Liu et al, AAAI 2018
    • Shi et al, TPAMI 2018 (ASTER)
    • Luo et al, PR 2019 (MORAN)

發展趨勢:

規則文本 \(\longrightarrow\) 不規則文本識別
CTC \(\longrightarrow\) Attention (1D, 2D)
檢測 + 識別 \(\longrightarrow\) 檢測識別端到端

Attention or CTC ?

長文本 CTC 好, 短文本 attention 好

Limitation of Attention and CTC

CTC:

  • Can hardly be directly applied to 2D prediction
  • Large computation involved for long sequence
  • Performance degradation for repeat patterns

Attention:

  • Misalignment problem (attention drift)
  • More memory size required

Why End2End ?

  • Prevent training errors be accumulater
    • errors can accumulate in a cascade of detection + recognition which may lead to large fraction of garbage predictions
  • Jointly optimization to help improve overall performance
  • Easier to maintain and adapt to new domain
    • maintaining a cascaded pipeline with data and model dependencied requires substantial engineering effort
  • Faster, Smaller, Stronger

Some new technique to bridge between detector and recognizer

  • RoI Rotate (多方向 e2e)
    • X Liu, et al, FOTS, CVPR 2018
  • Tailored RoI pooling (保持長寬比重采樣)
    • H Li et al. Towards End-to-EndText Spotting in Natural Scenes, arXiv 20190617 (extionsion of "H Li et al ICCV 2017")
  • RoI Masking (任意形狀e2e)
    • S Qin, A Bissacco, et al(Google AI), Towards Unconstrained End-to-End Text Spotting, ICCV 2019


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM