Extracting relevant questions to an RDF dataset using formal concept analysis
M. d'Aquin, and E. Motta. Proceedings of the sixth international conference on Knowledge capture, page 121--128. New York, NY, USA, ACM, (2011)
With the rise of linked data, more and more semantically described information is being published online according to the principles and technologies of the Semantic Web (especially, RDF and SPARQL). The use of such standard technologies means that this data should be exploitable, integrable and reusable straight away. However, once a potentially interesting dataset has been discovered, significant efforts are currently required in order to understand its schema, its content, the way to query it and what it can answer. In this paper, we propose a method and a tool to automatically discover questions that can be answered by an RDF dataset. We use formal concept analysis to build a hierarchy of meaningful sets of entities from a dataset. These sets of entities represent answers, which common characteristics represent the clauses of the corresponding questions. This hierarchy can then be used as a querying interface, proposing questions of varying levels of granularity and specificity to the user. A major issue is however that thousands of questions can be included in this hierarchy. Based on an empirical analysis and using metrics inspired both from formal concept analysis and from ontology summarization, we devise an approach for identifying relevant questions to act as a starting point to the navigation in the question hierarchy.