@kmd-ovgu

Extracting Cross References from Life Science Databases for Search Result Ranking

, , , and . Proceedings of the 20th ACM International Conference on Information and Knowledge Management, page 1253--1258. New York, NY, USA, ACM, (2011)
DOI: 10.1145/2063576.2063758

Abstract

Scholars in life sciences have to process huge amounts of data in a disciplined and efficient way. These data are spread among thousands of databases which overlap in content but differ substantially with respect to interface, formats and data structure. Search engines have the potential of assisting in data retrieval from these structured sources but fall short of providing a relevance ranking of the results that reflects the needs of life science scholars. One such need is to acquire insights to cross-references among entities in the databases, whereby search hits with many cross-references are expected to be more informative than those with few cross-references. In this work, we investigate to what extend this expectation holds. We propose BioXREF, a method that extracts cross-references from multiple life science databases by combining targeted crawling, pointer chasing, sampling and information extraction. We study the retrieval quality of our method and the relationship between manually crafted relevance ranking and relevance ranking based on cross-references, and report on first, promising results.

Links and resources

Tags

community

  • @kmd-ovgu
  • @dblp
@kmd-ovgu's tags highlighted