tess4j進行圖片文字識別

本文轉載自查看原文 2020-07-29 21:14 466 tess4j/ ocr

首次發布於：https://www.simonjia.top/495.html

有時候看到一些好的視頻ppt，想把ppt內容記錄下來，需要進行截圖然后ocr識別，網上的工具大都限制使用次數，有的免費的只能一次次導入導出，各種驗證碼頻次限制，所以使用起來不方便。現有的tess4j就是目前開源比較流行的ocr識別庫了，今天down下來試了試，還不錯，圖片識別准確度和速度也都挺好的，完美解決我們的需求（不想充會員，ps--得力的ocr識別ui和速度都不錯~）

導出下載項目地址：https://github.com/nguyenq/tess4j.git

本地下載好慢，用阿里雲wget速度還不錯（再下載本地），不然本地直接下載幾十k真的忍受不了~

下載后默認沒有中文字庫，需要我們自己去下載--https://github.com/tesseract-ocr/tessdata/blob/master/chi_sim.traineddata

然后寫個main方法：

package com.recognition.software.jdeskew;

import net.sourceforge.tess4j.ITesseract;
import net.sourceforge.tess4j.Tesseract;
import net.sourceforge.tess4j.TesseractException;

import java.io.*;

public class Tess4JTest {
	public static void main(String[] args) throws IOException {
		//存放圖片文件夾的地址
		String path = "E:\\github\\tess4j-master";
		File fileDir = new File(path + "//pics");
		File[] files = null;
		if (fileDir.isDirectory()) {
			files = fileDir.listFiles();
		}
		ITesseract instance = new Tesseract();
		for (File file : files) {
			String fileName = file.getName().substring(0, file.getName().indexOf(".") + 1);
			//存放語言庫文件地址
			File directory = new File(path);
			String courseFile = null;
			try {
				courseFile = directory.getCanonicalPath();
			} catch (IOException e) {
				e.printStackTrace();
			}
			//設置訓練庫的位置
			instance.setDatapath(courseFile + "//tessdata");
			//chi_sim ：簡體中文， eng	根據需求選擇語言庫
			instance.setLanguage("chi_sim");
			String result = null;
			try {
				long startTime = System.currentTimeMillis();
				result = instance.doOCR(file);
				long endTime = System.currentTimeMillis();
				System.out.println("Time is：" + (endTime - startTime) + " 毫秒");
			} catch (TesseractException e) {
				e.printStackTrace();
			}
			System.out.println("result: ");
			System.out.println(result);
			//將識別的文字放入不同文件內，也可以放一個，看需求- -
			FileOutputStream fos = new FileOutputStream(new File("C:\\Users\\Administrator\\Desktop\\" + fileName + ".txt"));
			OutputStreamWriter osw = new OutputStreamWriter(fos, "UTF-8");
			BufferedWriter bw = new BufferedWriter(osw);
			bw.write(result + "\t\n");
			bw.close();
			osw.close();
			fos.close();
		}
		
		
	}
	
}

我這里是讀取一個pics文件夾，將里面所有圖片都識別出來，然后每張圖片各自寫入一個txt文件中，

無需導入dll文件什么的，很方便，識別准確度也不錯！

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 簡單的驗證碼識別之Tess4j ocr智能圖文識別 tess4j 圖文，驗證碼識別 Java使用Tess4J 實現簡單的圖像識別(Maven版) java 使用tess4j實現OCR的最簡單樣例 linux tesseract 安裝及部署tess4j項目的常見問題使用百度文字識別API進行圖片中文字的識別 python 識別圖片文字 python之圖片識別文字如何批量識別圖像中的文字並對圖片進行識別的文字圖片文件重命名，也可以識別后更改文件夾也支持多級目錄 C#圖片文字識別