@datentaste

German Encyclopedia Alignment Based on Information Retrieval Techniques

, and . Research and Advanced Technology for Digital Libraries, volume 6273 of Lecture Notes in Computer Science, Springer Berlin / Heidelberg, 10.1007/978-3-642-15464-5\_32.(2010)

Abstract

Collaboratively created online encyclopedias have become increasingly popular. Especially in terms of completeness they have begun to surpass their printed counterparts. Two German publishers of traditional encyclopedias have reacted to this challenge and decided to merge their corpora to create a single more complete encyclopedia. The crucial step in this merge process is the alignment of articles. We have developed a system to identify corresponding entries from different encyclopedic corpora. The base of our system is the alignment algorithm which incorporates various techniques developed in the field of information retrieval. We have evaluated the system on four real-world encyclopedias with a ground truth provided by domain experts. A combination of weighting and ranking techniques has been found to deliver a satisfying performance.

Links and resources

Tags

community

  • @datentaste
  • @dblp
@datentaste's tags highlighted