PaddleOCR搭建

本文轉載自查看原文 2021-10-25 11:33 1570

如果是PyCharm端，則只需安裝：paddlepaddle、paddleocr、requirements.txt

基於PaddleHub部署CPU版本的PaddleOCR實操過程記錄。

PaddleOCR：release/2.2分支
PaddlePaddle 2.1.3
Paddlehub 2.1.0

(一)、windows10環境部署

1.Python環境准備

python安裝過程略(本文基於Python 3.7.7)

2.安裝飛槳預訓練模型管理和遷移學習工具PaddleHub

pip install paddlehub -i https://pypi.tuna.tsinghua.edu.cn/simple

3.安裝第三方庫shapely、pyclipper

pip install pyclipper -i https://pypi.tuna.tsinghua.edu.cn/simple

安裝shapely不能直接用pip install shapely命令安裝，否則后面在安裝文字識別模塊的時候會發生錯誤：

File "c:\programdata\anaconda3\lib\site-packages\shapely\geos.py", line 145, in <module>

_lgeos = CDLL(os.path.join(sys.prefix, 'Library', 'bin', 'geos_c.dll'))

File "c:\programdata\anaconda3\lib\ctypes\__init__.py", line 364, in __init__

self._handle = _dlopen(self._name, mode)

OSError: [WinError 126] 找不到指定的模塊。

正確的安裝方法是先執行腳本

import pip._internal.pep425tags
print(pip._internal.pep425tags.get_supported())

查看python支持的whl，再到https://www.lfd.uci.edu/~gohlke/pythonlibs/#shapely 頁面下載對應的shapely whl文件，我的文件是Shapely-1.7.1-cp37-cp37m-win_amd64.whl。最后基於whl文件安裝shapely，命令如下：

 pip install .\Shapely-1.7.1-cp37-cp37m-win_amd64.whl
如果沒有如下環境，則需要安裝
python -m pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple
pip install scikit-image

4.下載PaddleOCR代碼

git clone https://github.com/PaddlePaddle/PaddleOCR

5.下載推理模型

5.1.在PaddleOCR目錄下新建inference文件夾用於存放模型文件

5.2.下載中英文識別模型ch_PP-OCRv2_rec推理模型並解壓到inference文件夾

https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/doc_ch/models_list.md#多語言識別模型

模型的位置最好按照此目錄存放，否則后面文字識別模塊安裝的時候可能發生錯誤：

ModuleNotFoundError: No module named 'tools'

6.修改params.py的模型路徑

PaddleOCR\deploy\hubserving\ocr_rec\params.py 文件的cfg.rec_model_dir配置項修改為上述下載的模型文件路徑"./inference/ch_PP-OCRv2_rec_infer/"

7.安裝PaddleOCR的文字識別服務模塊到Paddlehub

在PaddleOCR目錄下執行命令

hub install deploy\hubserving\ocr_rec\

8.啟動服務

 hub serving start -m ocr_rec

(二)、Linux 環境部署(CentOS Linux release 7.5.1804 (Core))

1.Python環境准備

python安裝過程略(本文基於Python 3.7.7)

2.安裝飛槳預訓練模型管理和遷移學習工具PaddleHub

pip install paddlehub -i https://pypi.tuna.tsinghua.edu.cn/simple

3.安裝第三方庫shapely、pyclipper

pip install shapely -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install pyclipper -i https://pypi.tuna.tsinghua.edu.cn/simple

4.下載PaddleOCR代碼

git clone https://github.com/PaddlePaddle/PaddleOCR

5.下載推理模型

5.1.在PaddleOCR目錄下新建inference文件夾用於存放模型文件

5.2.下載中英文識別模型ch_PP-OCRv2_rec推理模型並解壓到inference文件夾

https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/doc_ch/models_list.md#多語言識別模型

解壓命令使用tar xvf ch_PP-OCRv2_rec_infer.tar，不能使用tar zxvf ch_PP-OCRv2_rec_infer.tar,否則會發生錯誤：

gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now

模型的位置最好按照此目錄存放，否則后面文字識別模塊安裝的時候可能發生錯誤：

ModuleNotFoundError: No module named 'tools'

6.修改params.py的模型路徑

PaddleOCR\deploy\hubserving\ocr_rec\params.py 文件的cfg.rec_model_dir配置項修改為上述下載的模型文件路徑"./inference/ch_PP-OCRv2_rec_infer/"

7.安裝PaddleOCR的文字識別服務模塊到Paddlehub

在PaddleOCR目錄下執行命令

hub install deploy\hubserving\ocr_rec\

8.啟動服務

 hub serving start -m ocr_rec

另外，在使用paddlehub調用ocr模型的時候發生了錯誤：

AttributeError: module 'paddlehub' has no attribute 'Module'

簡單的兩行代碼：

import paddlehub as hub
ocr = hub.Module(name="chinese_ocr_db_crnn_server")

在命令行里執行沒有問題，在python文件里執行就報上面的錯誤。折騰了半天，原因是自己給python文件命名時簡單地命名成了paddlehub.py,和引用的包paddlehub重名了。

(三)、文字識別測試

進入PaddleOCR\tools目錄，為了簡單起見，在目錄下放入一張命名為4.jpg的圖片,在命令行執行命令：

python test_hubserving.py http://127.0.0.1:8866/predict/ocr_rec 4.jpg

(四)、JAVA服務化(實驗)

便於工程化應用，將識別服務封裝成java服務。

@Autowired
private RestTemplate restTemplate;

4.1.組裝請求頭

/**  * 組裝請求頭  * @param imageFile  * @return  * @throws Exception  */ private static HttpEntity<String> makeHttpEntityHub( MultipartFile imageFile) throws Exception{ String content = Base64Utils.encodeToString(IOUtils.toByteArray(imageFile.getInputStream())); JSONObject params = new JSONObject(); params.put("images", new String[] { content }); HttpHeaders headers = new HttpHeaders(); headers.setContentType(MediaType.parseMediaType("application/json")); return new HttpEntity<String>(params.toJSONString(), headers); }

4.2.識別請求

@PostMapping("/text/recognition") public List<String> textRecognition(@RequestParam("file") MultipartFile imageFile) throws Exception { // 服務地址 TODO 動態配置  String url = "http://127.0.0.1:8866/predict/ocr_rec"; // 組裝請求頭  HttpEntity<String> httpEntity = makeHttpEntityHub(imageFile); // 文字識別  ResponseEntity<JSONObject> responseEntity = restTemplate.exchange(url, HttpMethod.POST, httpEntity, JSONObject.class); // 識別結果解析  List<String> list = Lists.newArrayList(); if (responseEntity.getStatusCodeValue() == 200 && "000".equals(responseEntity.getBody().getString("status"))) { JSONArray results = responseEntity.getBody().getJSONArray("results"); for (int i = 0; i < results.size(); i++) { JSONArray contents = results.getJSONArray(i); for (int j = 0; j < contents.size(); j++) { JSONObject content = contents.getJSONObject(j); list.add(content.getString("text")); } } } // 有效信息提取  return list ; }

(五)、串聯多模塊，改善識別效果

上面步驟中只部署了文字識別模塊，效果不忍直視，串聯分類和檢測模塊后效果提升明顯。

1.下載分類、和檢測推理模型並解壓到inference文件夾

PaddleOCR/models_list.md at release/2.3 · PaddlePaddle/PaddleOCR

分類模型：ch_ppocr_mobile_slim_v2.0_cls

檢測模型：ch_PP-OCRv2_det

2.修改params.py的模型路徑

修改PaddleOCR\deploy\hubserving\ocr_system\params.py 文件的模型配置：

#檢測模塊模型配置
cfg.det_model_dir = "./inference/ch_PP-OCRv2_det_infer/"
#識別模塊模型配置
cfg.rec_model_dir = "./inference/ch_PP-OCRv2_rec_infer/"
#分類模塊模型配置
cfg.cls_model_dir = "./inference/ch_ppocr_mobile_v2.0_cls_infer/"

3.安裝PaddleOCR的文字識別服務模塊到Paddlehub

在PaddleOCR目錄下執行命令

windows10:
hub install deploy\hubserving\ocr_system\

linux:
hub install deploy/hubserving/ocr_system/

在linux環境下需要安裝imgaug、lmdb，否則會報錯：

ModuleNotFoundError: No module named 'imgaug'
ModuleNotFoundError: No module named 'lmdb'

安裝命令：

pip install imgaug -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install lmdb -i https://pypi.tuna.tsinghua.edu.cn/simple

4.啟動串聯服務

windows:
hub serving start -m ocr_system
linux:
nohup hub serving start -m ocr_system &

5.文字識別測試

進入PaddleOCR\tools目錄，為了簡單起見，在目錄下放入一張命名為4.jpg的圖片,在命令行執行命令：

python test_hubserving.py http://127.0.0.1:8866/predict/ocr_system 4.jpg

參考文章：

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 Linux安裝使用paddleOCR (一)window調用PaddleOCR之C++ 編譯 PaddleOCR Windows 版 anaconda3+ paddleOCR安裝使用 python paddleocr 增加識別速度的方法百度開源：PaddleOCR與PaddlePaddle / paddle2onnx 實踐一基於PaddleOCR實現AI發票識別的Asp.net Core應用 PaddleOCR 報錯 error: Microsoft Visual C++ 14.0 or greater is required PaddleOCR，一款文本識別效果不輸於商用的Python庫！ PaddleOCR（飛槳OCR打包編譯）飛騰+銀河麒麟快速入門PaddleOCR，並使用其開發一個搜題小工具