Micro RNA or miRNA is a highly conserved class of non-coding RNA that plays an important role in many diseases. Identifying miRNA-disease associations can pave the way for better clinical diagnosis and finding potential drug targets. We propose a biologically-motivated data-driven approach for the miRNA-disease association prediction, which overcomes the data scarcity problem by exploiting information from multiple data sources. The key idea is to enrich the existing miRNA/disease-protein-coding gene (PCG) associations via a message passing framework, followed by the use of disease ontology information for further feature filtering. The enriched and filtered PCG associations are then used to construct the inter-connected miRNA-PCG-disease network to train a structural deep network embedding (SDNE) model. Finally, the pre-trained embeddings and the biologically relevant features from the miRNA family and disease semantic similarity are concatenated to form the pair input representations to a Random Forest classifier whose task is to predict the miRNA-disease association probabilities. We present large-scale comparative experiments, ablation, and case studies to showcase our approach’s superiority. Besides, we make the model prediction results for 1618 miRNAs and 3679 diseases, along with all related information, publicly available at http://software.mpm.leibniz-ai-lab.de/ to foster assessments and future adoption.
Description
Micro RNA or miRNA is a highly conserved class of non-coding RNA that
plays an important role in many diseases. Identifying miRNA-disease
associations can pave the way for better clinical diagnosis and finding
potential drug targets.
We propose a biologically-motivated data-driven approach for the
miRNA-disease association prediction, which overcomes the data scarcity
problem by exploiting information from multiple data sources. At the
same time, we propose a parameter-free yet effective mechanism to
control the quantity and quality of the added information sources. Our
contribution also lies in curating and releasing large independent test
sets to evaluate benchmarked models under various test settings,
including those for new miRNAs and new diseases. The proposed model
acquires state-of-the-art performance across multiple test settings. It
generates impressive results under realistic case studies corresponding
to little-known diseases. The survival analysis on publicly available
gene expression data further supports our findings.
We make the model prediction results for 1,618 miRNAs and 3,679
diseases, along with all biologically related information, publicly
available at http://software.mpm.leibniz-ai-lab.de/ to foster biological
assessments and future adoption.