@schrader

A Message Passing framework with Multiple data integration for miRNA-Disease association prediction

, , , and . Scientific Reports, (September 2022)
DOI: 10.1038/s41598-022-20529-5

Abstract

Micro RNA or miRNA is a highly conserved class of non-coding RNA that plays an important role in many diseases. Identifying miRNA-disease associations can pave the way for better clinical diagnosis and finding potential drug targets. We propose a biologically-motivated data-driven approach for the miRNA-disease association prediction, which overcomes the data scarcity problem by exploiting information from multiple data sources. The key idea is to enrich the existing miRNA/disease-protein-coding gene (PCG) associations via a message passing framework, followed by the use of disease ontology information for further feature filtering. The enriched and filtered PCG associations are then used to construct the inter-connected miRNA-PCG-disease network to train a structural deep network embedding (SDNE) model. Finally, the pre-trained embeddings and the biologically relevant features from the miRNA family and disease semantic similarity are concatenated to form the pair input representations to a Random Forest classifier whose task is to predict the miRNA-disease association probabilities. We present large-scale comparative experiments, ablation, and case studies to showcase our approach’s superiority. Besides, we make the model prediction results for 1618 miRNAs and 3679 diseases, along with all related information, publicly available at http://software.mpm.leibniz-ai-lab.de/ to foster assessments and future adoption.

Description

Micro RNA or miRNA is a highly conserved class of non-coding RNA that plays an important role in many diseases. Identifying miRNA-disease associations can pave the way for better clinical diagnosis and finding potential drug targets. We propose a biologically-motivated data-driven approach for the miRNA-disease association prediction, which overcomes the data scarcity problem by exploiting information from multiple data sources. At the same time, we propose a parameter-free yet effective mechanism to control the quantity and quality of the added information sources. Our contribution also lies in curating and releasing large independent test sets to evaluate benchmarked models under various test settings, including those for new miRNAs and new diseases. The proposed model acquires state-of-the-art performance across multiple test settings. It generates impressive results under realistic case studies corresponding to little-known diseases. The survival analysis on publicly available gene expression data further supports our findings. We make the model prediction results for 1,618 miRNAs and 3,679 diseases, along with all biologically related information, publicly available at http://software.mpm.leibniz-ai-lab.de/ to foster biological assessments and future adoption.

Links and resources

Tags

community

  • @schrader
  • @ndong
@schrader's tags highlighted