Large quantities of historical newspapers are being digitized and OCRd. We describe a framework for processing the OCRd text to identify articles and extract metadata for them. We describe the article schema and provide examples of features that facilitate automatic indexing of them. For this processing, we employ lexical semantics, structural models, and community content. Furthermore, we describe visualization and summarization techniques that can be used to present the extracted events.
ANNO ist der virtuelle Zeitungslesesaal der Österreichischen Nationalbibliothek. Hier kann in historischen österreichischen Zeitungen und Zeitschriften online geblättert und gelesen werden.
Digitalisierungsprojekt der Universitätbibliothek und der Staats- und Stadtbibliothek Augsburg. Projektumfang: Jgg. 1770 - 1806. Fortsetzung 1807 - 1848 in Vorbereitung.
The National Library of Australia, in collaboration the Australian State and Territory libraries, has commenced a
program to digitise out of copyright newspapers.