Large quantities of historical newspapers are being digitized and OCRd. We describe a framework for processing the OCRd text to identify articles and extract metadata for them. We describe the article schema and provide examples of features that facilitate automatic indexing of them. For this processing, we employ lexical semantics, structural models, and community content. Furthermore, we describe visualization and summarization techniques that can be used to present the extracted events.
The <div> TYPE attribute vocabulary is a list of terms that may be used to categorise the core structural elements of an object in a METS document conforming to the Australian METS Profile. Examples of how these values may be applied are given in the Appendix – Content Models. The content model in the current version of the document represent use cases that have been tested by the Maintenance Agency, and further content models and vocabulary terms will be added as they are developed.
New start-up [1] Smalltown is going after the local business listings market with an ambitious, focused social network model. It has a charming “smalltown” feel, and seeks to build a community of users around those listings.
Digitalisierungsprojekt der Universitätbibliothek und der Staats- und Stadtbibliothek Augsburg. Projektumfang: Jgg. 1770 - 1806. Fortsetzung 1807 - 1848 in Vorbereitung.
J. Singh, and A. Anand. Proceedings of the 2017 Conference on Conference Human Information Interaction and Retrieval, page 361--364. New York, NY, USA, ACM, (2017)
A. Dallmann, F. Lemmerich, D. Zoller, and A. Hotho. Proceedings of the LWA 2015 Workshops: KDML, FGWM, IR, and FGDB. Trier, Germany, 7.-9. October 2015, CEUR-WS.org, (2015)
D. Rajanen, M. Salminen, and N. Ravaja. Proceedings of the 19th International Academic Mindtrek Conference (Academic MindTrek 2015), page 155--162. New York, NY, USA, ACM, (2015)