IMPACT is a Centre of Competence that makes digitisation of historical printed text in Europe faster, cheaper and better, and provides tools, services and facilities for further advancement of the State of the Art in this field.
ALTO (Analyzed Layout and Text Object) is a XML Schema that details technical metadata for describing the layout and content of physical text resources, such as pages of a book or a newspaper. It most commonly serves as an extension schema used within the Metadata Encoding and Transmission Schema (METS) administrative metadata section. However, ALTO instances can also exist as a standalone document used independently of METS.
Chronicling America provides bulk access to its OCR data. Each file will decompress into directory structure that lets you easily map the OCR file to the URL identifier for that page. Historic American Newspapers