制作自己的tesseract-docker環境鏡像(實戰)


  做OCR圖文識別,在linux系統上發布時,需要安裝tesseract環境。網上信息比較雜,基於各種linux系統做的Dockerfile,其表現也是五花八門,搞不清白。以下是我經過一兩天的摸索的成果,可以有效的部署環境,希望對大家有用。過程大致分為三個階段:1、制作基礎鏡像包,安裝tesseract環境;2、上傳tessdata語言包到服務器上,供tesseract識別時對照;3、制作應用程序的鏡像,掛載tessdata語言包目錄到/usr/local/share/tessdata,同時設置docker容器的環境變量TESSDATA_PREFIX;

一、准備基礎鏡像的Dockerfile文件。需要相關資源文件 tesseract-4.1.1.tar.gz,leptonica-1.80.0.tar.gz

https://github.com/tesseract-ocr/tesseract/releases/tag/4.1.1

http://www.leptonica.org/source/leptonica-1.80.0.tar.gz

FROM mamohr/centos-java
LABEL ANTHOR="siman(214382122@qq.com)" VERSION="1.0.0" BUILD_DATE="2020-09-01" \
      RESOURCES="https://github.com/tesseract-ocr/tesserac http://www.leptonica.org/index.html https://github.com/tesseract-ocr/tessdata" \
      DESCRIPTION="This image integrated and edited the running environment of tesseract-4.1.1 and leptonica-1.80.0, \
      and made it based on CentOS system. Based on this basic image, you can run your own tess4j jar application"

# 環境變量(tesseract)
ENV LD_LIBRARY_PATH="/usr/local/lib" \
    LIBLEPT_HEADERSDIR="/usr/local/include" \
    PKG_CONFIG_PATH="/usr/local/lib/pkgconfig"
# 安裝tesseract環境
ADD   tesseract-4.1.1.tar.gz /
ADD   leptonica-1.80.0.tar.gz /

RUN   yum -y install file automake libicu-devel libpango1.0-dev libcairo-dev libjpeg-devel libpng-devel libtiff-devel zlib-devel libtool gcc-c++ make \
      && cd /leptonica-1.80.0 && ./configure && make && make install \
      && cd /tesseract-4.1.1 && ./autogen.sh && ./configure && make && make install \
      && rm -rf /leptonica-1.80.0 /tesseract-4.1.1
# 時區設置
RUN ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
RUN echo 'Asia/Shanghai' >/etc/timezone

二、創建基礎鏡像包

docker build -t tess/centos-java:v1.0 . 

三、安裝tessdata包

 鏈接: https://pan.baidu.com/s/1XAvPkTdUXuFq-q2InDREhQ 提取碼: 6vjp  

四、制作自己的springboot-ocr服務鏡像包,設置環境變量TESSDATA_PREFIX

FROM tess/centos-java:v1.0
LABEL ANTHOR="siman(214382122@qq.com)" VERSION="1.0.0" BUILD_DATE="2020-09-01"
VOLUME /tmp
ADD simm-framework-test-1.0.jar app.jar
EXPOSE 8080
ENV  TESSDATA_PREFIX="/usr/local/share/tessdata"
# 啟動入口
ENTRYPOINT ["java","-jar","/app.jar"]

 五、啟動容器,並掛載tessdata目錄

docker run -it -v /usr/tessdata:/usr/local/share/tessdata -p 8080:8080 --name="ocr-api" ocr-api:v1.0


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM