java使用開源類庫Tesseract實現圖片識別

本文轉載自查看原文 2019-09-08 21:46 747 tesseract

Tesseract-OCR支持中文識別，並且開源和提供全套的訓練工具，是快速低成本開發的首選。

Tess4J則是Tesseract在Java PC上的應用

Tesseract的OCR引擎最先由HP實驗室於1985年開始研發，至1995年時已經成為OCR業內最准確的三款識別引擎之一。然而，HP不久便決定放棄OCR業務，Tesseract也從此塵封。

數年以后，HP意識到，與其將Tesseract束之高閣，不如貢獻給開源軟件業，讓其重煥新生－－2005年，Tesseract由美國內華達州信息技術研究所獲得，並求諸於Google對Tesseract進行改進、消除Bug、優化工作。

Tesseract目前已作為開源項目發布在Google Project，其項目主頁在這里查看。

   <!-- https://mvnrepository.com/artifact/net.sourceforge.tess4j/tess4j -->
        <dependency>
            <groupId>net.sourceforge.tess4j</groupId>
            <artifactId>tess4j</artifactId>
            <version>3.4.0</version>
        </dependency>

實現代碼開發：

  File imageFile = new File("input dir/shuzi.png"); Tesseract tessreact = new Tesseract(); //需要指定訓練集 訓練集到 https://github.com/tesseract-ocr/tessdata 下載。
        tessreact.setDatapath("E:\\itcast\\env\\tess4j\\tessdata"); //注意 默認是英文識別，如果做中文識別，需要單獨設置。
        tessreact.setLanguage("chi_sim"); try { String result = tessreact.doOCR(imageFile); System.out.println(result); } catch (TesseractException e) { System.err.println(e.getMessage()); }

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 java 基於Tesseract實現圖片文字識別基於Tesseract實現圖片文字識別開源圖片文字識別引擎——Tesseract OCR python 使用tesseract進行圖片識別 python 使用tesseract進行圖片識別 mac 使用tesseract識別圖片中的中文 Windows下訓練Tesseract實現識別圖片中的文字使用Pytesseract+Tesseract-OCR識別圖片的簡單步驟 golang嘗試圖片識別OCR庫tesseract使用 Tesseract OCR 圖片文字識別