Conference,

Bridging the Gap - Using External Knowledge Bases for Context-Aware Document Retrieval

, , and .
(12/2013 2013)

Abstract

Today, a vast amount of information is made available over the Web in the form of unstructured text indexed by Web search engines. But especially for searches on abstract concepts or context terms, a simple keyword-based Web search may compromise retrieval quality, because query terms may or may not directly occur in the texts (vocabulary problem). The respective state of the art solution is query expansion leading to an increase in recall, although it often also leads to a steep decrease of retrieval precision. This decrease how¬ever is a severe problem for digital library providers: in libraries it is vital to ensure high quality retrieval meet-ing current standards. In this paper we present an approach allowing even for ab-stract context searches (conceptual queries) with high retrieval quality by using Wikipedia to semantically bridge the gap between query terms and textual con-tent. We do not expand queries, but extract the most important terms from each text document in a focused Web collection and then enrich them with features gathered from Wikipedia. These enriched terms are further used to compute the relevance of a document with respect to a conceptual query. The evaluation shows significant improvements over query expansion approaches: the overall re-trieval quality is increased up to 74.5\% in mean average precision.

Tags

Users

  • @koehncke
  • @toennies

Comments and Reviews