Towards Sustainable view-based Extract-Transform-Load (ETL) Fusion of Open Data
kmueller stadler singh hellmann. Proceedings of the Workshop on 3rd Workshop on Linked Data Quality co-located with the European Semantic Web Conference 2016 (ESWC 2016), (May 2016)
Openly available datasets originate from different data providers which range from government agencies, over commercial enterprises to communities of data enthusiasts. Integrating different source datasets into a single RDF graph by using ETL (Extract-Transform-Load) sys- tems which perform offline transformation, ontology matching and link- ing techniques usually takes many iterations of revisions until the target dataset is made free of the most obvious mapping, linking and consis- tency errors. Since ETL systems produce the RDF offline, any map- ping or content change requires a re-ingest of the relevant source data. When dealing with heterogeneous source datasets, creating a unified tar- get dataset can be a tedious undertaking. Therefore the paper proposes an RDF view based ingestion approach, which allows real-time ``debug- ging'' of the unified dataset where mappings and links can be changed with immediate effect. Once the unified graph passes all data quality tests, the RDF can be materialized. This process poses an alternative to existing ETL solutions.