Techreport,

Name Data Sources for Semantic Enrichment

.
Deliverable, Part of Deliverable D2.4. Europeana Creative project, (February 2015)

Abstract

Semantic enrichment in Europeana is a very difficult task due to several factors: 1. Varying metadata quality across different collections, sometimes including misallocation of metadata fields; 2. Varying metadata formatting practices across different collections, e.g. some collections indicate the role of a creator in brackets after the creator name; 3. Lack of accurate language information. In this report we focus on Person & Institution enrichment (person Named Entity Recognition), which in itself is an ambitious task. Historic people are often referred to by many names. For successful semantic enrichment it's important to integrate high-quality and high-coverage datasets that provide name info. There is a great number of Name Authority files maintained at libraries, museums and other heritage institutions world-wide, e.g. VIAF, ISNI, Getty ULAN, British Museum. Linked Open Data (LOD) datasets also have a plethora of names, e.g. in DBpedia, Wikidata and FreeBase. We analyze some of the available datasets in terms of person coverage, name coverage, language tags, extra features that can be useful for enrichment, quality. We also analyze the important topic of coreferencing, i.e. how connected the sources are to each other.

Tags

Users

  • @valexiev

Comments and Reviews