The eXtensible Text Framework (XTF) is a powerful open source platform for providing access to digital content. Developed and maintained by the California Digital Library (CDL), XTF functions as the primary access technology for the CDL's digital collections and other digital projects worldwide.
Das SBB Zeitungen METS-Profil - Exchange beschreibt das Datenformat für den Austausch von Metadaten für digitale Objekte digitalisierter Zeitungen zwischen der Staatsbibliothek zu Berlin und Dritten, die als Auftragnehmer diese Daten erstellen.
textMD is a XML Schema maintained by the Library of Congress that details technical metadata for text-based digital objects. It allows for detailing properties such as encoding information (quality, platform, software, agent), character information (character set and size, byte order and size, line terminators), languages, fonts, markup information, processing and textual notes, technical requirements for printing and viewing, and page ordering and sequencing.
We follow the MODS schema and therefore a web-based MODS editor was built to create MODS XML records. These records are deposited into the eXist native XML database where they can be accessed via a REST API.
ISO 2146 (Registry Services for Libraries and Related Organisations) is an international standard currently under development by ISO TC46 SC4 WG7 to operate as a framework for building registry services for libraries and related organisations. It takes the form of an information model that identifies the objects and data elements needed for the collaborative construction of registries of all types. It is not bound to any specific protocol or data schema. The aim is to be as abstract as possible, in order to facilitate a shared understanding of the common processes involved, across multiple communities of practice.
This schema is currently referred to as "NISO Metadata for Images in XML (NISO MIX)". MIX is expressed using the XML schema language of the World Wide Web Consortium. MIX is maintained for NISO by the Network Development and MARC Standards Office of the Library of Congress with input from users.
Large quantities of historical newspapers are being digitized and OCRd. We describe a framework for processing the OCRd text to identify articles and extract metadata for them. We describe the article schema and provide examples of features that facilitate automatic indexing of them. For this processing, we employ lexical semantics, structural models, and community content. Furthermore, we describe visualization and summarization techniques that can be used to present the extracted events.
zerlegt digital elektronische, Papier-, Mikrofilm- oder Mikrofiche- Dokumente in ihre Bestandteile und schafft durchsuchbare Inhalte bei gleichzeitigem