Large quantities of historical newspapers are being digitized and OCRd. We describe a framework for processing the OCRd text to identify articles and extract metadata for them. We describe the article schema and provide examples of features that facilitate automatic indexing of them. For this processing, we employ lexical semantics, structural models, and community content. Furthermore, we describe visualization and summarization techniques that can be used to present the extracted events.
ALTO (Analyzed Layout and Text Object) is a XML Schema that details technical metadata for describing the layout and content of physical text resources, such as pages of a book or a newspaper. It most commonly serves as an extension schema used within the Metadata Encoding and Transmission Schema (METS) administrative metadata section. However, ALTO instances can also exist as a standalone document used independently of METS.
The Archivists’ Toolkit™, or the AT, is the first open source archival data management system to provide broad, integrated support for the management of archives. It is intended for a wide range of archival repositories. The main goals of the AT are to support archival processing and production of access instruments, promote data standardization, promote efficiency, and lower training costs.
zerlegt digital elektronische, Papier-, Mikrofilm- oder Mikrofiche- Dokumente in ihre Bestandteile und schafft durchsuchbare Inhalte bei gleichzeitigem
The <div> TYPE attribute vocabulary is a list of terms that may be used to categorise the core structural elements of an object in a METS document conforming to the Australian METS Profile. Examples of how these values may be applied are given in the Appendix – Content Models. The content model in the current version of the document represent use cases that have been tested by the Maintenance Agency, and further content models and vocabulary terms will be added as they are developed.
Bisher wird kein direkter Export von MODS unterstützt. Die Metadaten aus Katalogen des GBV ließen sich aber grundsätzlich nach MODS umwandeln, beispielsweise über MARC21.
The broad aim of the project is to kick-start a critical mass of METS-based projects within the UK so ensuring that UK institutions are fully standards-based in their digital object management.
METS Navigator is a METS-based system developed by the Indiana University Digital Library Program for displaying and navigating sets of page images or other multi-part digital objects. METS, the Metadata Encoding and Transmission Standard, is an XML standard, maintained by the Library of Congress, for managing and describing digital library objects. Using the information in the METS <structMap> elements, METS Navigator builds a hierarchical menu that allows users to navigate to specific sections of a document, such as title page, specific chapters, illustrations, etc. METS Navigator also allows simple navigation to the next, previous, first, and last page image or component part of a digital object.