PDF OCR with Fedora 24 and Tesseract

Run the following commands:

sudo dnf install python3-pip python3-devel libffi-devel qpdf tesseract tesseract-langpack-deu tesseract-osd
sudo python3 -m pip install ocrmypdf 

Now you can convert a file like this:

ocrmypdf -l deu input.pdf output.pdf

If you don’t install the tesseract-osd package, it will work but the following error message appears:

Error opening data file /usr/share/tesseract/tessdata/osd.traineddata
   INFO -    8: [tesseract] Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
   INFO -    8: [tesseract] Failed loading language 'osd'
   INFO -    8: [tesseract] Tesseract couldn't load any languages!
   INFO -    8: [tesseract] Warning: Auto orientation and script detection requested, but osd language failed to load

Leave a comment

Your email address will not be published. Required fields are marked *