PhD thesis,

Conceptual schema matching based on similarity heuristics

.
Department of Informatics – Pontifical Catholic University of Rio de Janeiro, DSc Thesis (Advisor: Casanova, Marco Antonio), (March 2009)

Abstract

Schema matching is a fundamental issue in many database applications, such as query mediation, database integration, catalog matching and data warehousing. In this thesis, we first address how to match catalogue schemas. A catalogue is a simple database that holds information about a set of objects, typically classified using terms taken from a given thesaurus. We introduce a matching approach, based on the notion of similarity, which applies to pairs of thesauri and to pairs of lists of properties. We then describe matchings based on mutual information and introduce variations that explore certain heuristics. Lastly, we discuss experimental results that evaluate the precision of the matchings introduced and that measure the influence of the heuristics. We then focus on the more complex problem of matching two schemas that belong to an expressive OWL dialect. We adopt an instance-based approach and, therefore, assume that a set of instances from each schema is available. We first decompose the problem of OWL schema matching into the problem of vocabulary matching and the problem of concept mapping. We also introduce sufficient conditions guaranteeing that a vocabulary matching induces a correct concept mapping. Next, we describe an OWL schema matching technique based on the notion of similarity. Lastly, we evaluate the precision of the technique using data available on the Web. Unlike any of the previous instance-based techniques, the matching process we describe uses similarity functions to induce vocabulary matchings in a non-trivial way, coping with an expressive OWL dialect. We also illustrate, through a set of examples, that the structure of OWL schemas may lead to incorrect concept mappings and indicate how to avoid such pitfalls.

Tags

Users

  • @jabreftest

Comments and Reviews