ubuntu linux 1604 編譯安裝tesseract-ocr 4.0


 

 

主要參考官方的編譯,梳理一下整個流程

Linux

The build instructions for Linux also apply to other UNIX like operating systems.

Dependencies

  • A compiler for C and C++: GCC or Clang
  • GNU Autotools: autoconf, automake, libtool
  • autoconf-archive
  • pkg-config
  • Leptonica
  • libpng, libjpeg, libtiff

Ubuntu

If they are not already installed, you need the following libraries (Ubuntu 16.04/14.04):

  一、安裝依賴:

sudo apt-get install g++ autoconf automake libtool autoconf-archive pkg-config libpng12-dev libjpeg8-dev libtiff5-dev zlib1g-dev libleptonica-dev -y

或者一條一條復制:
sudo apt-get install g++ # or clang++ (presumably)
sudo apt-get install autoconf automake libtool
sudo apt-get install autoconf-archive
sudo apt-get install pkg-config
sudo apt-get install libpng12-dev
sudo apt-get install libjpeg8-dev
sudo apt-get install libtiff5-dev
sudo apt-get install zlib1g-dev

if you plan to install the training tools, you also need the following libraries:

安裝訓練所依賴的庫:
sudo apt-get install libicu-dev libpango1.0-dev libcairo2-dev

或者:
sudo apt-get install libicu-dev sudo apt-get install libpango1.0-dev sudo apt-get install libcairo2-dev

Leptonica

You also need to install Leptonica. Ensure that the development headers for Leptonica are installed before compiling Tesseract.

Tesseract versions and the minimum version of Leptonica required:

二、安裝leptonica,

因為tesseract依賴這個庫,否則在configure的時候會提示

最新的tesseract 4.0 及3.05 需要從Leptonica 源代碼編譯

git clone https://github.com/DanBloomberg/leptonica.git

cd leptonica

./configure

make -j8 && make install

Tesseract Leptonica Ubuntu
4.00 1.74.2 Must build from source
3.05 1.74.0 Must build from source
3.04 1.71 Ubuntu 16.04
3.03 1.70 Ubuntu 14.04
3.02 1.69 Ubuntu 12.04
3.01 1.67  

One option is to install the distro's Leptonica package:

sudo apt-get install libleptonica-dev

but if you are using an oldish version of Linux, the Leptonica version may be too old, so you will need to build from source.

The sources are at https://github.com/DanBloomberg/leptonica . The instructions for building are given in Leptonica README.

Note that if building Leptonica from source, you may need to ensure that /usr/local/lib is in your library path. This is a standard Linux bug, and the information at Stackoverflow is very helpful.

Installing Tesseract from Git

Please follow instructions in https://github.com/tesseract-ocr/tesseract/wiki/Compiling--GitInstallation

Also read Install Instructions

三、編譯tesseract


clone源代碼 :
git clone https://github.com/tesseract-ocr/tesseract.git tesseract-ocr
cd tesseract-ocr ./autogen.sh
   autoreconf -i ./configure
這時會提示:
Configuration is done.
You can now build and install tesseract by running:

$ make
$ sudo make install

Training tools can be built and installed with:

$ make training
$ sudo make training-install

繼續編譯,先編譯tesseract,在編譯安裝 training
   make   sudo make install   
  make training
  make training-install

sudo ldconfig

到這就完成了真個編譯過程,這個時候 在命令行中 輸入tesseract 會提示怎么用。



四、配置字體庫
tesseract/tessdata是一個配置目錄可以以此為基礎把所有用的語言包放在這里面
cd tesseract的父目錄
cp -r  tesseract/tessdata/ tessdata/
下載需要的語言包 https://github.com/tesseract-ocr/tessdata_best 里面有各種語言包,這是訓練好的語言包。簡體中文下載:chi_sim.traineddata chi_sim_vert.traineddata

下載好的語言包 放在tessdata目錄里面

設置環境變量 tessdata的父目錄。如:export TESSDATA_PREFIX=/media/sf_E_DRIVE/src-test/tesseract_all/tesseract_linux

 

五、使用tesseract
具體用法可參考tesseract的使用說明

tesseract /home/app/1.png output -l chi_sim
識別/home/app/1.png這張圖片。輸出到output.txt 里面,用chi_sim 識別(不用加.traineddata,會默認加
cat output.txt 可以查看剛才的內容


 

Install elsewhere / without root

Tesseract can be configured to install anywhere, which makes it possible to install it without root access.

To install it in $HOME/local:

./autogen.sh
./configure --prefix=$HOME/local/
make install

To install it in $HOME/local using Leptonica libraries also installed in $HOME/local:

./autogen.sh
LIBLEPT_HEADERSDIR=$HOME/local/include ./configure \
  --prefix=$HOME/local/ --with-extra-libraries=$HOME/local/lib
make install

Video representation of the Compiling process for Tesseract 4.0 and Leptonica 1.7.4 on Ubuntu 16.xx

Language Data

You can also use:

export TESSDATA_PREFIX=/some/path/to/tessdata 

to point to your tessdata directory (example: if your tessdata path is '/usr/local/share/tessdata' you have to use 'export TESSDATA_PREFIX='/usr/local/share/').



免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM