copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

DistRDF2ML - Scalable Distributed In-Memory Machine Learning Pipelines for RDF Knowledge Graphs

C. Draschner, C. Stadler, F. Bakhshandegan Moghaddam, J. Lehmann, and H. Jabeen. Proceedings of the 30th ACM International Conference on Information & Knowledge Management, page 4465–4474. New York, NY, USA, Association for Computing Machinery, (2021)
DOI: 10.1145/3459637.3481999

Abstract

This paper presents DistRDF2ML, the generic, scalable, and distributed framework for creating in-memory data preprocessing pipelines for Spark-based machine learning on RDF knowledge graphs. This framework introduces software modules that transform large-scale RDF data into ML-ready fixed-length numeric feature vectors. The developed modules are optimized to the multi-modal nature of knowledge graphs. DistRDF2ML provides aligned software design and usage principles as common data science stacks that offer an easy-to-use package for creating machine learning pipelines. The modules used in the pipeline, the hyper-parameters and the results are exported as a semantic structure that can be used to enrich the original knowledge graph. The semantic representation of metadata and machine learning results offers the advantage of increasing the machine learning pipelines' reusability, explainability, and reproducibility. The entire framework of DistRDF2ML is open source, integrated into the holistic SANSA stack, documented in scala-docs, and covered by unit tests. DistRDF2ML demonstrates its scalable design across different processing power configurations and (hyper-)parameter setups within various experiments. The framework brings the three worlds of knowledge graph engineers, distributed computation developers, and data scientists closer together and offers all of them the creation of explainable ML pipelines using a few lines of code.

Links and resources

BibTeX key: Draschner2021
entry type: inproceedings
address: New York, NY, USA
booktitle: Proceedings of the 30th ACM International Conference on Information & Knowledge Management
year: 2021
pages: 4465–4474
publisher: Association for Computing Machinery
series: CIKM '21
isbn: 9781450384469
numpages: 10
location: Virtual Event, Queensland, Australia
DOI: 10.1145/3459637.3481999
Document: https://svn.aksw.org/papers/2021/cikm-distrdf2ml/public.pdf

@aksw's tags highlighted

Cite this publication

%0 Conference Paper %1 Draschner2021 %A Draschner, Carsten Felix %A Stadler, Claus %A Bakhshandegan Moghaddam, Farshad %A Lehmann, Jens %A Jabeen, Hajira %B Proceedings of the 30th ACM International Conference on Information & Knowledge Management %C New York, NY, USA %D 2021 %I Association for Computing Machinery %K group_aksw jabeen lehmann stadler %P 4465–4474 %R 10.1145/3459637.3481999 %T DistRDF2ML - Scalable Distributed In-Memory Machine Learning Pipelines for RDF Knowledge Graphs %U https://svn.aksw.org/papers/2021/cikm-distrdf2ml/public.pdf %X This paper presents DistRDF2ML, the generic, scalable, and distributed framework for creating in-memory data preprocessing pipelines for Spark-based machine learning on RDF knowledge graphs. This framework introduces software modules that transform large-scale RDF data into ML-ready fixed-length numeric feature vectors. The developed modules are optimized to the multi-modal nature of knowledge graphs. DistRDF2ML provides aligned software design and usage principles as common data science stacks that offer an easy-to-use package for creating machine learning pipelines. The modules used in the pipeline, the hyper-parameters and the results are exported as a semantic structure that can be used to enrich the original knowledge graph. The semantic representation of metadata and machine learning results offers the advantage of increasing the machine learning pipelines' reusability, explainability, and reproducibility. The entire framework of DistRDF2ML is open source, integrated into the holistic SANSA stack, documented in scala-docs, and covered by unit tests. DistRDF2ML demonstrates its scalable design across different processing power configurations and (hyper-)parameter setups within various experiments. The framework brings the three worlds of knowledge graph engineers, distributed computation developers, and data scientists closer together and offers all of them the creation of explainable ML pipelines using a few lines of code. %@ 9781450384469

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

DistRDF2ML - Scalable Distributed In-Memory Machine Learning Pipelines for RDF Knowledge Graphs

Abstract

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML DistRDF2ML - Scalable Distributed In-Memory Machine Learning Pipelines for RDF Knowledge Graphs

Abstract

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

DistRDF2ML - Scalable Distributed In-Memory Machine Learning Pipelines for RDF Knowledge Graphs

Comments and Reviews
(0)