copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Automatically incorporating new sources in keyword search-based data integration

P. Talukdar, Z. Ives, and F. Pereira. Proceedings of the 2010 international conference on Management of data, page 387--398. New York, NY, USA, ACM, (2010)
DOI: 10.1145/1807167.1807211

Abstract

Scientific data offers some of the most interesting challenges in data integration today. Scientific fields evolve rapidly and accumulate masses of observational and experimental data that needs to be annotated, revised, interlinked, and made available to other scientists. From the perspective of the user, this can be a major headache as the data they seek may initially be spread across many databases in need of integration. Worse, even if users are given a solution that integrates the current state of the source databases, new data sources appear with new data items of interest to the user. Here we build upon recent ideas for creating integrated views over data sources using keyword search techniques, ranked answers, and user feedback 32 to investigate how to automatically discover when a new data source has content relevant to a user's view - in essence, performing automatic data integration for incoming data sets. The new architecture accommodates a variety of methods to discover related attributes, including label propagation algorithms from the machine learning community 2 and existing schema matchers 11. The user may provide feedback on the suggested new results, helping the system repair any bad alignments or increase the cost of including a new source that is not useful. We evaluate our approach on actual bioinformatics schemas and data, using state-of-the-art schema matchers as components. We also discuss how our architecture can be adapted to more traditional settings with a mediated schema.

Description

Automatically incorporating new sources in keyword search-based data integration

Links and resources

BibTeX key: Talukdar:2010:AIN:1807167.1807211
entry type: inproceedings
address: New York, NY, USA
booktitle: Proceedings of the 2010 international conference on Management of data
year: 2010
pages: 387--398
publisher: ACM
series: SIGMOD '10
location: Indianapolis, Indiana, USA
acmid: 1807211
isbn: 978-1-4503-0032-2
numpages: 12
DOI: 10.1145/1807167.1807211
url: http://doi.acm.org/10.1145/1807167.1807211

@schmidt2's tags highlighted

Cite this publication

%0 Conference Paper %1 Talukdar:2010:AIN:1807167.1807211 %A Talukdar, Partha Pratim %A Ives, Zachary G. %A Pereira, Fernando %B Proceedings of the 2010 international conference on Management of data %C New York, NY, USA %D 2010 %I ACM %K data_integration dataspaces toread %P 387--398 %R 10.1145/1807167.1807211 %T Automatically incorporating new sources in keyword search-based data integration %U http://doi.acm.org/10.1145/1807167.1807211 %X Scientific data offers some of the most interesting challenges in data integration today. Scientific fields evolve rapidly and accumulate masses of observational and experimental data that needs to be annotated, revised, interlinked, and made available to other scientists. From the perspective of the user, this can be a major headache as the data they seek may initially be spread across many databases in need of integration. Worse, even if users are given a solution that integrates the current state of the source databases, new data sources appear with new data items of interest to the user. Here we build upon recent ideas for creating integrated views over data sources using keyword search techniques, ranked answers, and user feedback 32 to investigate how to automatically discover when a new data source has content relevant to a user's view - in essence, performing automatic data integration for incoming data sets. The new architecture accommodates a variety of methods to discover related attributes, including label propagation algorithms from the machine learning community 2 and existing schema matchers 11. The user may provide feedback on the suggested new results, helping the system repair any bad alignments or increase the cost of including a new source that is not useful. We evaluate our approach on actual bioinformatics schemas and data, using state-of-the-art schema matchers as components. We also discuss how our architecture can be adapted to more traditional settings with a mediated schema. %@ 978-1-4503-0032-2

@inproceedings{Talukdar:2010:AIN:1807167.1807211, abstract = {Scientific data offers some of the most interesting challenges in data integration today. Scientific fields evolve rapidly and accumulate masses of observational and experimental data that needs to be annotated, revised, interlinked, and made available to other scientists. From the perspective of the user, this can be a major headache as the data they seek may initially be spread across many databases in need of integration. Worse, even if users are given a solution that integrates the current state of the source databases, new data sources appear with new data items of interest to the user. Here we build upon recent ideas for creating integrated views over data sources using keyword search techniques, ranked answers, and user feedback [32] to investigate how to automatically discover when a new data source has content relevant to a user's view - in essence, performing automatic data integration for incoming data sets. The new architecture accommodates a variety of methods to discover related attributes, including label propagation algorithms from the machine learning community [2] and existing schema matchers [11]. The user may provide feedback on the suggested new results, helping the system repair any bad alignments or increase the cost of including a new source that is not useful. We evaluate our approach on actual bioinformatics schemas and data, using state-of-the-art schema matchers as components. We also discuss how our architecture can be adapted to more traditional settings with a mediated schema.}, acmid = {1807211}, added-at = {2012-05-04T14:25:05.000+0200}, address = {New York, NY, USA}, author = {Talukdar, Partha Pratim and Ives, Zachary G. and Pereira, Fernando}, biburl = {https://www.bibsonomy.org/bibtex/29f791bff0af5f062c13e33b4c924502a/schmidt2}, booktitle = {Proceedings of the 2010 international conference on Management of data}, description = {Automatically incorporating new sources in keyword search-based data integration}, doi = {10.1145/1807167.1807211}, interhash = {78c667108ab6e45abc455e117bbb54ed}, intrahash = {9f791bff0af5f062c13e33b4c924502a}, isbn = {978-1-4503-0032-2}, keywords = {data_integration dataspaces toread}, location = {Indianapolis, Indiana, USA}, numpages = {12}, pages = {387--398}, publisher = {ACM}, series = {SIGMOD '10}, timestamp = {2012-05-04T14:25:05.000+0200}, title = {Automatically incorporating new sources in keyword search-based data integration}, url = {http://doi.acm.org/10.1145/1807167.1807211}, year = 2010 }

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Automatically incorporating new sources in keyword search-based data integration

Abstract

Description

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Automatically incorporating new sources in keyword search-based data integration

Abstract

Description

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Automatically incorporating new sources in keyword search-based data integration

Comments and Reviews
(0)