OCR 綜述

本文轉載自查看原文 2019-10-18 09:22 427 OCR 綜述/ ocr

OCR 發展趨勢

場景文本檢測
場景文字識別
端到端場景文本識別

場景文字檢測

方法舉例:

基於回歸的方法
- Gupta et al, CVPR 2016; Tian et al, ECCV 2016;
- Shi, Bai, et al, ICCV 2017; Liu et al, CVPR 2017;
- Liao et al, AAAI 2017; Hu et al, ICCV 2017 ...
基於分割的方法
- Zhong et al, CVPR 2016; Zhou et al, CVPR 2017;
- Wu et al, ICCV 2017; Dent et al, AAAI 2018;
- X Li, CVPR 2019; W Wang, et al, CVPR 2019 ...
混合方法 (分割+回歸)
- He et al, ICCV 2017; Lyu et al, CVPR 2018;
- Liao et al, CVPR 2018; Long et al, ECCV 2018;
- Liu et al, IJCAI 2019 ...

發展趨勢:

水平矩形框檢測 \(\longrightarrow\) 多方向矩形框 \(\longrightarrow\) 多方向四邊形 \(\longrightarrow\) 曲線文本 \(\longrightarrow\) 任意形狀

注:

Segmentation based 的方法不容易准確區分相鄰或重疊文本
Regression based 的方法對長文本不易檢測完整
- Bounding box regression 方法需要設置合理的 anchor 參數

Anchor & RPN 調參問題:

Anchor free 回歸方法舉例:

Segmentation based methods
C.He et al, Direct Regression..., ICCV 2017, TIP 2018.
Z Zhong et al, An Anchor-Free Region Proposal Network..., IJDAR 2019.
Zhi Tian, Chunhua Shen, et al, FCOS, CVPR, 2019.
Chenchen Zhu, Yihui He, et al, FSAF, CVPR, 2019.
Tao Kong, Fuchun Sun et al, FoveaBox, arXiv 2019.

Why anchor free?
大多數 RPN regression 方法需要設置合理的 anchors 參數
Eg: SSD \(\longrightarrow\) TextBox (AAAI 2017)

Alternative anchor design?
Lele Xie, Yuliang Liu, Lianwen Jin, Zecheng Xie, DeRPN: Taking a further step toward more general object detection, AAAI 2019.

場景文字識別

場景文字識別方法:

基於 CTC 的方法
- P.He et al, AAAI 2016 (DTRN: CNN+RNN+CTC)
- B.shi et al, TPAMI 2017 (CRNN: CNN+RNN+CTC)
- F Yin, et al, arXiv 2017 (CNN+CTC)
- Y Wu, etal, arXiv 2018 (CNN+CTC)
- Y Liu et al, ECCV 2018 (GAN+CTC)
基於 attention 的方法
- C Lee et al, CVPR 2016; B shi 二圖案例, CVPR 2016
- X Yang et al, IJCAI 2017
- Bai et al, CVPR 2018; Liu et al, AAAI 2018
- Shi et al, TPAMI 2018 (ASTER)
- Luo et al, PR 2019 (MORAN)

發展趨勢:

規則文本 \(\longrightarrow\) 不規則文本識別
CTC \(\longrightarrow\) Attention (1D, 2D)
檢測 + 識別 \(\longrightarrow\) 檢測識別端到端

Attention or CTC ?

長文本 CTC 好, 短文本 attention 好

Limitation of Attention and CTC

CTC:

Can hardly be directly applied to 2D prediction
Large computation involved for long sequence
Performance degradation for repeat patterns

Attention:

Misalignment problem (attention drift)
More memory size required

Why End2End ?

Prevent training errors be accumulater
- errors can accumulate in a cascade of detection + recognition which may lead to large fraction of garbage predictions
Jointly optimization to help improve overall performance
Easier to maintain and adapt to new domain
- maintaining a cascaded pipeline with data and model dependencied requires substantial engineering effort
Faster, Smaller, Stronger

Some new technique to bridge between detector and recognizer

RoI Rotate (多方向 e2e)
- X Liu, et al, FOTS, CVPR 2018
Tailored RoI pooling (保持長寬比重采樣)
- H Li et al. Towards End-to-EndText Spotting in Natural Scenes, arXiv 20190617 (extionsion of "H Li et al ICCV 2017")
RoI Masking (任意形狀e2e)
- S Qin, A Bissacco, et al(Google AI), Towards Unconstrained End-to-End Text Spotting, ICCV 2019

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 【OCR技術系列之五】自然場景文本檢測技術綜述（CTPN, SegLink, EAST） Detection綜述一、Solr綜述 Eclipse 4綜述圖片文字OCR識別-tesseract-ocr OCR圖片識別引擎 [Paddle OCR] Deploy on Jetson OCR4：Tesseract 4 天若ocr不能用 OCR識別詳細步驟