@toennies

Quality Control using Semantic Technologies in Digital Libraries

. Insitut für Informationssysteme, TU Braunschweig, Mühlenpfordtstraße 23, 38106 Braunschweig, Germany, phd thesis, (December 2012)

Abstract

Controlled content quality especially in terms of indexing is one of the major ad-vantages of using digital libraries in contrast to general Web sources or Web search engines. Therefore, more and more digital libraries offer corpora related to a specialized domain. Beyond simple keyword based searches the resulting infor-mation systems often rely on entity centered searches. For being able to offer this kind of search, a high quality document processing is essential. However, considering today’s information flood the mostly manual effort in ac-quiring new sources and creating suitable (semantic) metadata for content indexing and retrieval is already prohibitive. A recent solution is given by automatic genera-tion of metadata, where mostly statistical techniques like e.g. document classifica-tion and entity extraction currently become more widespread. But in this case neglecting quality assurance is even more problematic, because heuristic genera-tion often fails and the resulting low-quality metadata will directly diminish the quality of service that a digital library provides. Thus, the quality assessment of information system’s metadata annotations used for subsequent querying of collections has to be enabled. In this thesis we discuss the importance of metadata quality assessment for information systems and the benefits gained from controlled and guaranteed quality.

Links and resources

Tags