Installing python-tesseract on Mac OS X

I’ve used macports during this installation as it was more convenient for me. There are various installation guides for python-tesseract on the official website. Hopefully you already have xcode, apple-gcc, python, numpy and opencv installed. At the time of writing this, tesseract-ocr version was 3.01 (checked out revision 863) and python-tesseract version was 0.8-1.7

Step 1: Some often used tools (optional)

You might as well get these things that are often used during installation of various packages such as “subversion”, “wget”, “axel”, “cmake”, “automake”, “autoconf”, “libtools”, “swig” and “swig-python”. If you already have these the following code will simply update them. Use the following command at your terminal:

$ sudo port install subversion wget axel cmake automake autoconf libtool swig swig-python

While installing the above things macports discovered that it needed to rebuild apple-gcc42 and opencv. Building apple-gcc42 took quite a long time (20 mins or so). Building apple-gcc42 also ate all my memory resources so I advise you to not run other programs while doing this. In case you don’t already have apple-gcc have a look at this blog article for installing apple-gcc

Step 2: Leptonica

You will need “leptonica” before installing tesseract. Get it, build it and install it. You might want to check www.leptonica.com for the latest version.

The --prefix option used in Step 2 and Step 3 specifies where to install the files and prevents the issue of tesseract not finding leptonica. You can use some location other than opt/local (use ./configure --help for more options):

$ wget http://www.leptonica.com/source/leptonica-1.69.tar.gz
$ tar zxvf leptonica-1.69.tar.gz
$ cd leptonica-1.69
$ ./configure --prefix=/opt/local
$ make
$ sudo make install

Step 3: Tesseract OCR Engine

Your terminal will be in the directory “leptonica-1.69″. You can go back to your home directory and get tesseract-ocr using svn (This really took a long while because it was getting “traineddata” for various languages):

$ svn checkout http://tesseract-ocr.googlecode.com/svn/trunk/ tesseract-ocr
$ cd tesseract-ocr/
$ sed -i '.bak'  's/^libtoolize/glibtoolize/g' autogen.sh
$ sed -i '.bak'  's|usr/local|opt/local|g' configure.ac
$ ./autogen.sh
$ ./configure --prefix=/opt/local
$ make
$ sudo make install

Step 4: Install Python Wrapper for Tesseract

Change directory to your home directory and then:

$ wget http://python-tesseract.googlecode.com/files/python-tesseract.macosx-10.8-intel.tar.gz
$ sudo tar zxvf python-tesseract.macosx-10.8-intel.tar.gz -C /opt/local

Note that the -C /opt/local option will install tesseract.py and related things to /opt/local/Library/Python/2.7/site-packages/ and you’ll need to add this directory to your sys.path by modifying python’s search path.

Step 5: Using Tesseract via Python

Within python IDLE and python scripts you should now be able to import tesseract

Tesseract will look for the *.traineddata files in opt/local/share/tessdata because of the installation procedure (Step 2 and Step 3) shown above. I had to copy the “tessdata” folder in the “tesseract-ocr” folder (obtained from the svn checkout) to opt/local/share so that things could finally work.

There are few examples on python-tesseract’s official homepage that you can try out.

About these ads

3 comments

  1. When I used make, I got this error:

    /Applications/Xcode.app/Contents/Developer/usr/bin/make all-recursive
    Making all in ccutil
    /bin/sh ../libtool –tag=CXX –mode=compile g++ -DHAVE_CONFIG_H -I. -I.. -O2 -DNDEBUG -I/opt/local/include -I/usr/local/include//leptonica -DTESSDATA_PREFIX=/opt/local/share/ -std=c++11 -MT scanutils.lo -MD -MP -MF .deps/scanutils.Tpo -c -o scanutils.lo scanutils.cpp
    libtool: compile: g++ -DHAVE_CONFIG_H -I. -I.. -O2 -DNDEBUG -I/opt/local/include -I/usr/local/include//leptonica -DTESSDATA_PREFIX=/opt/local/share/ -std=c++11 -MT scanutils.lo -MD -MP -MF .deps/scanutils.Tpo -c scanutils.cpp -fno-common -DPIC -o .libs/scanutils.o
    scanutils.cpp:38:14: error: typedef redefinition with different types (‘long’ vs ‘__darwin_off_t’ (aka ‘long long’))
    typedef long off_t;
    ^
    /usr/include/sys/_types/_off_t.h:30:25: note: previous definition is here
    typedef __darwin_off_t off_t;

    1. I have the same problem. Any progress on this compilation error?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: