Inproceedings,

Analyzing and Accessing Wikipedia as a Lexical Semantic Resource

, , and .
Biannual Conference of the Society for Computational Linguistics and Language Technology, (2007)

Abstract

We analyze Wikipedia as a lexical semantic resource and compare it with conventional resources, such as dictionaries, thesauri, semantic wordnets, etc. Different parts of Wikipedia reflect different aspects of these resources. We show that Wikipedia contains a vast amount of knowledge about, e.g., named entities, domain specific terms, and rare word senses. If Wikipedia is to be used as a lexical semantic resource in large-scale NLP tasks, efficient programmatic access to the knowledge therein is required. We review existing access mechanisms and show that they are limited with respect to performance and the provided access functions. Therefore, we introduce a general purpose, high performance Java-based Wikipedia API that overcomes these limitations. It is available for research purposes at http://www.ukp.tu-darmstadt.de/software/WikipediaAPI.

Tags

Users

  • @renew
  • @brightbyte

Comments and Reviews