I’ve used macports during this installation as it was more convenient for me. There are various installation guides for python-tesseract on the official website. Hopefully you already have xcode, apple-gcc, python, numpy and opencv installed. At the time of writing this, tesseract-ocr version was 3.01 (checked out revision 863) and python-tesseract version was 0.8-1.7
Step 1: Some often used tools (optional)
You might as well get these things that are often used during installation of various packages such as “subversion”, “wget”, “axel”, “cmake”, “automake”, “autoconf”, “libtools”, “swig” and “swig-python”. If you already have these the following code will simply update them. Use the following command at your terminal:
$ sudo port install subversion wget axel cmake automake autoconf libtool swig swig-python
While installing the above things macports discovered that it needed to rebuild apple-gcc42 and opencv. Building apple-gcc42 took quite a long time (20 mins or so). Building apple-gcc42 also ate all my memory resources so I advise you to not run other programs while doing this. In case you don’t already have apple-gcc have a look at this blog article for installing apple-gcc
Step 2: Leptonica
You will need “leptonica” before installing tesseract. Get it, build it and install it. You might want to check www.leptonica.com for the latest version.
--prefix option used in Step 2 and Step 3 specifies where to install the files and prevents the issue of tesseract not finding leptonica. You can use some location other than opt/local (use
./configure --help for more options):
$ wget http://www.leptonica.com/source/leptonica-1.69.tar.gz $ tar zxvf leptonica-1.69.tar.gz $ cd leptonica-1.69 $ ./configure --prefix=/opt/local $ make $ sudo make install
Step 3: Tesseract OCR Engine
Your terminal will be in the directory “leptonica-1.69”. You can go back to your home directory and get tesseract-ocr using svn (This really took a long while because it was getting “traineddata” for various languages):
$ svn checkout http://tesseract-ocr.googlecode.com/svn/trunk/ tesseract-ocr $ cd tesseract-ocr/ $ sed -i '.bak' 's/^libtoolize/glibtoolize/g' autogen.sh $ sed -i '.bak' 's|usr/local|opt/local|g' configure.ac $ ./autogen.sh $ ./configure --prefix=/opt/local $ make $ sudo make install
Step 4: Install Python Wrapper for Tesseract
Change directory to your home directory and then:
$ wget http://python-tesseract.googlecode.com/files/python-tesseract.macosx-10.8-intel.tar.gz $ sudo tar zxvf python-tesseract.macosx-10.8-intel.tar.gz -C /opt/local
Note that the
-C /opt/local option will install tesseract.py and related things to
/opt/local/Library/Python/2.7/site-packages/ and you’ll need to add this directory to your
sys.path by modifying python’s search path.
Step 5: Using Tesseract via Python
Within python IDLE and python scripts you should now be able to
Tesseract will look for the *.traineddata files in
opt/local/share/tessdata because of the installation procedure (Step 2 and Step 3) shown above. I had to copy the “tessdata” folder in the “tesseract-ocr” folder (obtained from the svn checkout) to
opt/local/share so that things could finally work.
There are few examples on python-tesseract’s official homepage that you can try out.