我們就識別上面的漢字。
安裝軟件tesseract和python庫
https://www.cnblogs.com/sea-stream/p/10961580.html
然后新建一個文件夾test,把上面那張圖片放在文件夾里面,再新建一個test文件
寫入如下內容
#coding=utf-8 from PIL import Image import pytesseract #上面都是導包,只需要下面這一行就能實現圖片文字識別 text=pytesseract.image_to_string(Image.open('xxx.png'),lang='chi_sim') print(text)
目錄如下:
運行可能會出現錯誤:
C:\Users\k\Desktop\test>python test.py Traceback (most recent call last): File "test.py", line 5, in <module> text=pytesseract.image_to_string(Image.open('xxx.png'),lang='chi_sim') File "C:\Users\k\Anaconda3\lib\site-packages\pytesseract\pytesseract.py", line 309, in image_to_string }[output_type]() File "C:\Users\k\Anaconda3\lib\site-packages\pytesseract\pytesseract.py", line 308, in <lambda> Output.STRING: lambda: run_and_get_output(*args), File "C:\Users\k\Anaconda3\lib\site-packages\pytesseract\pytesseract.py", line 218, in run_and_get_output run_tesseract(**kwargs) File "C:\Users\k\Anaconda3\lib\site-packages\pytesseract\pytesseract.py", line 194, in run_tesseract raise TesseractError(status_code, get_errors(error_string)) pytesseract.pytesseract.TesseractError: (1, 'Error opening data file C:\\Program Files (x86)\\Tesseract-OCR/tessdata/chi_sim.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory. Failed loading language \'chi_sim\' Tesseract couldn\'t load any languages! Could not initialize tesseract.')
因為tesseract-ocr默認不支持中文識別。 將下載到的文件:chi_sim.traineddata 放到Tesseract-OCR安裝目錄 D:\Program Files (x86)\Tesseract-OCR\tessdata 下
鏈接:https://pan.baidu.com/s/1c-fveIYnm1sQHxX9WRpUZw
提取碼:9ovq
再次運行
python test.py
下面是輸出結果
C:\Users\k\Desktop\test>python test.py
風急天高猿嘯衷′ 渚麥冒麥少丑弓飛口。
u邊洛木蕭蕭下′ 不〖長江滾滾來。
萬 悲禾火常作畜′ 年多病獨登台。
艱難苦恨縈霜 渣倒新停澍酉木不=
參考: