@soeren

Dataset Retrieval

, and . 7th IEEE International Conference on Semantic Computing, September 16-18, 2013, Irvine, California, USA, (2013)

Abstract

Recently, a large number of dataset repositories, catalogs and portals are emerging in the science and government realms. Once a large number of datasets are published on such data portals, the question arises how to retrieve datasets satisfying a user's information need. In this article, we present an approach for retrieving datasets according to user queries. We define dataset retrieval as a specialization of information retrieval. Instead of retrieving documents that are relevant to a certain information need, dataset retrieval describes the process of returning relevant RDF datasets. As with information retrieval, the term relevance cannot be clearly defined when using traditional methods like stemming. The inherent usage of RDF in RDF datasets enables a better way of retrieving relevant ones. We therefore propose an additional retrieval mechanism, which is inspired by facet search: dataset filtering. When querying, the entire set of available datasets is processed by a set of semantic filters each of which can unambiguously decide whether or not a given dataset is relevant to the query. The resulting set is then given back to the requester. We implemented and evaluated our approach in CKAN, which fuels publicdata.eu and is the most popular data portal worldwide.

Links and resources

Tags

community

  • @dice-research
  • @aksw
  • @soeren
  • @dblp
@soeren's tags highlighted