Abstract

Data integration is a pervasive challenge faced in applications that need to query across multiple autonomous and heterogeneous data sources. Data integration is crucial in large enterprises that own a multitude of data sources, for progress in large-scale scientific projects, where data sets are being produced independently by multiple researchers, for better cooperation among government agencies, each with their own data sources, and in offering good search quality across the millions of structured data sources on the World- Wide Web. Ten years ago we published “Querying Heterogeneous Information Sources using Source Descriptions” 73, a paper describing some aspects of the Information Manifold data integration project. The Information Manifold and many other projects conducted at the time 5, 6, 20, 25, 38, 43, 51, 66, 100 have led to tremendous progress on data integration and to quite a few commercial data integration products. This paper offers a perspective on the contributions of the Information Manifold and its peers, describes some of the important bodies of work in the data integration field in the last ten years, and outlines some challenges to data integration research today. We note in advance that this is not intended to be a comprehensive survey of data integration, and even though the reference list is long, it is by no means complete.

Links and resources

Tags

community

  • @b.bruns
  • @jullybobble
  • @jmora
  • @lillejul
  • @lautenschlager
  • @szagorac
  • @boehr
  • @markush
  • @alexjdl
  • @dblp
@lillejul's tags highlighted