Abstract

Natural language technologies have long been envisioned to play a crucial role in developing a Semantic Web. Textual content's significance on the Web has increased with the rise of Web 2.0 and mass participation in content generation. Yet, natural language technologies face great challenges in dealing with Web content's heterogeneity: key among these is domain and task adaptation. To address this challenge, the authors consider the problem of semantically annotating Wikipedia. Specifically, they investigate a method for dealing with domain and task adaptation of semantic taggers in cases where parallel text and metadata are available. By creating a semantic mapping among vocabularies from two sources: Wikipedia and the original annotated corpus, they improve their tagger on Wikipedia. Moreover, by applying their tagger and mapping between sources, they significantly extend the metadata currently available in the DBpedia collection. This article is part of a special issue on Natural Language Processing and the Web.

Links and resources

Tags

community