文本檢測和識別代碼結構梳理

本文轉載自查看原文 2019-02-22 20:33 1696 機器學習/深度學習/ 圖像處理

前言：

最近學習了一些OCR相關的基礎知識，包含目標檢測和自然語言處理。

正好，在數字中國有相關的比賽：

https://www.datafountain.cn/competitions/334/details/rule

所以想動手實踐一下，實際中發現，對於數據標簽的處理和整個檢測和識別的流程並不熟悉，自己從頭去搞還是有很大難度。

幸好，有大佬們之前開源的一些baseline可以參考，有檢測的也有識別的，對於真真理解OCR識別是有幫助的。

1）最初baseline AdvancedEAST + CRNN
https://github.com/Tianxiaomo/Cultural_Inheritance-Recognizing_Chinese_Calligraphy_in_Multiple_Scenarios

2）一個新的baseline：EAST + ocr_densenet

https://github.com/DataFountainCode/huawei_code_share

還有最原始的開源的EAST 源碼，advanced EAST源碼

https://github.com/argman/EAST

https://github.com/huoyijie/AdvancedEAST

CRNN 源碼

https://github.com/bgshih/crnn

以及densenet 等，都是很好的學習資源

https://github.com/yinchangchang/ocr_densenet

PART1： EAST

下面，先對EAST 的整個代碼進行梳理：
訓練樣本格式：

img_1.jpg

img_1.txt

img_2.jpg

img_2.txt

（這個可以用第二個baseline中的convert_to_txt.py 實現）

即訓練集包含圖像以及圖像對應的標注信息（4個位置坐標和文字）

python multigpu_train.py --gpu_list=0 --input_size=512 --batch_size_per_gpu=14 --checkpoint_path=/tmp/east_icdar2015_resnet_v1_50_rbox/ \
--text_scale=512 --training_data_path=/data/ocr/icdar2015/ --geometry=RBOX --learning_rate=0.0001 --num_readers=24 \
--pretrained_model_path=/tmp/resnet_v1_50.ckpt

訓練完成之后們就可以進行測試

python eval.py --test_data_path=./tmp/test_image/ --gpu_list=0 --checkpoint_path=./tmp/east_icdar2015_resnet_v1_50_rbox/ --output_dir=./tmp/output/

加載已經訓練好的模型進行測試


bug解決：
1、lanms 無法完成編譯，將Makefile中的Python3 替換為 Python即可make：
I modify the file lanms/Makefile ,change the python3-config to python-config

CXXFLAGS = -I include -std=c++11 -O3 $(shell python3-config --cflags)
LDFLAGS = $(shell python3-config --ldflags)

2、在測試輸出時出現

Traceback (most recent call last):
  File "eval.py", line 194, in <module>
    tf.app.run()
  File "/usr/local/anaconda3/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 126, in run
    _sys.exit(main(argv))
  File "eval.py", line 160, in main
    boxes, timer = detect(score_map=score, geo_map=geometry, timer=timer)
  File "eval.py", line 98, in detect
    boxes = lanms.merge_quadrangle_n9(boxes.astype('float32'), nms_thres)
  File "/work/ocr/EAST/lanms/__init__.py", line 12, in merge_quadrangle_n9
    from .adaptor import merge_quadrangle_n9 as nms_impl
ImportError: dynamic module does not define module export function (PyInit_adaptor)

nms_locality.nms_locality() is a python implemention, its much slower than c++ code, if just want to test, you can use it, these two methods should provide the same result.

When I change the lanms.merge_quadrangle_n9() in eval.py to nms_locality.nms_locality() There's no error.

C++版本實現調用有問題，直接用Python的實現，這里只是慢一點，結果都是一樣的；

PART2： CRNN

參考源碼：https://github.com/bai-shang/OCR_TF_CRNN_CTC

訓練方法：

1）轉換數據，對應圖像和標簽

For example: image_list.txt

90kDICT32px/1/2/373_coley_14845.jpg coley
90kDICT32px/17/5/176_Nevadans_51437.jpg nevadans

Note: make sure that images can be read from the path you specificed, such as:

path/to/90kDICT32px/1/2/373_coley_14845.jpg
path/to/90kDICT32px/17/5/176_Nevadans_51437.jpg
.......



命令行轉換為tfrecord：

python tools/create_crnn_ctc_tfrecord.py \
--image_dir ./data/ --anno_file ./data/train.txt --data_dir ./tfrecords/ \
--validation_split_fraction 0.1

問題：

1）最初bug：TypeError: None has type NoneType, but expected one of: int, long

是因為有未定義的字，也就是不在字典中的字，所以在字典中，字典不完整，單獨加未在字典中的編碼 "<undefined>": 6736

而且在原代碼中：

def _string_to_int(label):
# convert string label to int list by char map
char_map_dict = json.load(open(FLAGS.char_map_json_file, 'r'))

int_list = []
for c in label:
int_list.append(char_map_dict.get(c,6736)) # 增加新的分類6736

2) python2 中會遇到許多編碼的問題，建議換成Python3

def _bytes_feature(value):
    if type(value) is str:
        value = value.encode('utf-8')
    if sys.version_info[0] > 2:
        value = value # convert string object to bytes
    if not isinstance(value, list):
        value = [value]
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=value))

代碼調試的時候，一步步打印中間結果，分析問題原因：

try:

print (tf.train.Feature(int64_list=tf.train.Int64List(value=value)))

except:
print(value)

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 惡意代碼檢測--已看梳理常用CTPN、CRNN文本檢測識別框架混合高斯模型：opencv中MOG2的代碼結構梳理 Vue項目結構梳理【OCR技術系列之六】文本檢測CTPN的代碼實現深度學習檢測方法梳理 Ceph代碼梳理 kafka consumer代碼梳理 Caffe學習系列（二）Caffe代碼結構梳理，及相關知識點歸納梳理caffe代碼layer(五)