@festplatte

Building an Evaluation Corpus for German Question Answering by Harvesting Wikipedia

, , and . Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC'06), Genoa, Italy, European Language Resources Association (ELRA), (May 2006)

Abstract

The growing interest in open-domain question answering is limited by the lack of evaluation and training resources. To overcome this resource bottleneck for German, we propose a novel methodology to acquire new question-answer pairs for system evaluation that relies on volunteer collaboration over the Internet. Utilizing Wikipedia, a popular free online encyclopedia available in several languages, we show that the data acquisition problem can be cast as a Web experiment. We present a Web-based annotation tool and carry out a distributed data collection experiment. The data gathered from the mostly anonymous contributors is compared to a similar dataset produced in-house by domain experts on the one hand, and the German questions from the from the CLEF QA 2004 effort on the other hand. Our analysis of the datasets suggests that using our novel method a medium-scale evaluation resource can be built at very small cost in a short period of time. The technique and software developed here is readily applicable to other languages where free online encyclopedias are available, and our resulting corpus is likewise freely available.

Description

Building an Evaluation Corpus for German Question Answering by Harvesting Wikipedia - ACL Anthology

Links and resources

Tags

community

  • @festplatte
  • @dblp
@festplatte's tags highlighted