Abstract
Data integration is a pervasive challenge faced in applications
that need to query across multiple autonomous and
heterogeneous data sources. Data integration is crucial in
large enterprises that own a multitude of data sources, for
progress in large-scale scientific projects, where data sets are
being produced independently by multiple researchers, for
better cooperation among government agencies, each with
their own data sources, and in offering good search quality
across the millions of structured data sources on the World-
Wide Web.
Ten years ago we published “Querying Heterogeneous Information
Sources using Source Descriptions” 73, a paper
describing some aspects of the Information Manifold data
integration project. The Information Manifold and many
other projects conducted at the time 5, 6, 20, 25, 38, 43,
51, 66, 100 have led to tremendous progress on data integration
and to quite a few commercial data integration
products. This paper offers a perspective on the contributions
of the Information Manifold and its peers, describes
some of the important bodies of work in the data integration
field in the last ten years, and outlines some challenges
to data integration research today. We note in advance that
this is not intended to be a comprehensive survey of data
integration, and even though the reference list is long, it is
by no means complete.
Links and resources
Tags
community