raleighpublicrecord/dochive · GitHub, DocHive has 2 prerequisites, ImageMagic and Tesserac. coverts pdf pages to images and the OCRs the image. purpose is to extract numeric statistical tables in PDFs for import into spreadsheets.
OCRopus is an OCR system written in Python, NumPy, and SciPy focusing on the use of large scale machine learning for addressing problems in document analysis. Formerly Tesseract.
IMPACT is a Centre of Competence that makes digitisation of historical printed text in Europe faster, cheaper and better, and provides tools, services and facilities for further advancement of the State of the Art in this field.