The <div> TYPE attribute vocabulary is a list of terms that may be used to categorise the core structural elements of an object in a METS document conforming to the Australian METS Profile. Examples of how these values may be applied are given in the Appendix – Content Models. The content model in the current version of the document represent use cases that have been tested by the Maintenance Agency, and further content models and vocabulary terms will be added as they are developed.
The National Digital Newspaper Program (NDNP), a partnership between the National Endowment for the Humanities (NEH) and the Library of Congress (LC), is a long-term effort to develop an Internet-based, searchable database of all U.S. newspapers with descriptive information and select digitization of historic pages. Supported by NEH, this rich digital resource will be developed and permanently maintained at the Library of Congress. An NEH grant program will fund the contribution of content from, eventually, all U.S. states and territories.
The National Library of Australia, in collaboration the Australian State and Territory libraries, has commenced a
program to digitise out of copyright newspapers.
Large quantities of historical newspapers are being digitized and OCRd. We describe a framework for processing the OCRd text to identify articles and extract metadata for them. We describe the article schema and provide examples of features that facilitate automatic indexing of them. For this processing, we employ lexical semantics, structural models, and community content. Furthermore, we describe visualization and summarization techniques that can be used to present the extracted events.