raleighpublicrecord/dochive · GitHub, DocHive has 2 prerequisites, ImageMagic and Tesserac. coverts pdf pages to images and the OCRs the image. purpose is to extract numeric statistical tables in PDFs for import into spreadsheets.
OCRopus is an OCR system written in Python, NumPy, and SciPy focusing on the use of large scale machine learning for addressing problems in document analysis. Formerly Tesseract.