TextDetection文本檢測數據集匯總


字符識別和文本檢測在實際生活中十分重要,從最簡單的車牌檢測到復雜的環境文本識別都需要這一技術的支持。目前這一領域最著名的會議是International Conference on Document Analysis and Recognition(ICDAR)

1.文字檢測與識別主要數據集

在這里插入圖片描述


Total-Text

paper


COCO-Text, COCO-Text V2

paper


MSRA-TD500
在這里插入圖片描述
ref paper


ICDAR2017, 競賽中包含了多個領域的數據集。

Category: Handwritten Historical Document Layout Recognition

cBAD: ICDAR2017 Competition on Baseline Detection ICDAR2017 Competition on Layout Analysis for Challenging Medieval Manuscripts ICDAR2017 Competition on Historical Book Analysis
Category: Historical Handwritten Script Analysis

ICDAR 2017 Competition on the Classification of Medieval Handwritings in Latin Script ICDAR2017 Competition on Historical Document Writer Identification (Historical-WI) Competition on Multi-script Writer Identification Using LAMIS-MSHD and CERUG Databases
Category: Character/Word Spotting

Competition on Query-by-Example Glyph Spotting of Southeast Asian Palm Leaf Manuscript Images Handwritten Keyword Spotting Competition
Category: Handwriting Recognition

ICDAR2017 Competition on Handwritten Text Recognition on the READ Dataset ICDAR2017 Competition on Information Extraction in Historical Handwritten Records
Category: Document Image Binarization

ICDAR2017 Competition on Document Image Binarization (DIBCO 2017)
Category: Document Recognition (Layout analysis & Text Recognition)

ICDAR2017 Competition on Recognition of Documents with Complex Layouts – RDCL2017 ICDAR2017 Competition on Recognition of Early Indian Printed Documents – REID2017 ICDAR2017 Competition on Page Object Detection
Category: Document Reconstruction

Smartphone-captured Document Image Reconstruction from Multiple Views
Category: Post OCR Correction

ICDAR2017 Competition on Post-OCR Text Correction
Category: Robust Reading Competitions

ICDAR2017 Competition on Reading Chinese Text in the Wild (RCTW-17) paper:https://arxiv.org/pdf/1708.09585.pdf ICDAR2017 Robust Reading Challenge on COCO-Text
Category: Text in Video

ICDAR2017 Competition on Arabic Text Detection and Recognition in Multi-resolution Video Frames Competition on Video Script Identification
Category: Forensics
Competition on File Type Identification
Category: Miscellaneous Competitions

ICDAR2017 Competition on Multi-font and Multi-Size Digitally Represented Arabic Text ref:http://mac.xmu.edu.cn/valse2017/ppt/Invited/VALSE2017_bx.pdf

ICDAR2015
場景文字識別
生成數字圖像文字識別
還包含了一個文本超分辨數據集
opencv中的一個接口


ICDAR2013
Robust Reading:http://refbase.cvc.uab.es/files/KSU2013.pdf
中文手寫數據集, 下載
ref:https://www.computer.org/csdl/proceedings/icdar/2013/4999/00/06628568.pdf
數字文件researcher:https://roundtrippdf.com/en/

2.一些最近發表的工作(from total-text)

Detection

MSRFTSN, TextSnake, TextField , Mask TextSpotter , TextNet, Textboxes, EAST, Baseline, SegLink

End-to-end Recognition

TextNet, Mask TextSpotter, Textboxes


此外還有下面一些和數字字符識別相關的數據集:
手寫字符識別:MNIST


街道門牌號數據集:SVHN
在這里插入圖片描述


一些相關網站,可以找到更多數據集:
國際模式識別協會會第十一技術組(OCR):IAPR-TC11TC11 datasets
圖像識別TC10工作組, TC10 datasets
ICDAR 2017匯總:https://github.com/cs-chan/Total-Text-Dataset
近年來Robust Reading競賽匯總網站:http://rrc.cvc.uab.es/
研究導航:http://www.guide2research.com/conference/icdar-2019

在這里插入圖片描述
pic from pexels.com


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM