copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Latent Feature Representations for Human Gene Expression Data Improve Phenotypic Predictions

Y. Pantazis, C. Tselas, K. Lakiotaki, V. Lagani, and i. Tsamardinos. IEEE, (2020)
DOI: 10.1109/BIBM49941.2020.9313286

Abstract

High-throughput technologies such as microarrays and RNA-sequencing (RNA-seq) allow to precisely quantify transcriptomic profiles, generating datasets that are inevitably high-dimensional. In this work, we investigate whether the whole human transcriptome can be represented in a compressed, low dimensional latent space without loosing relevant information. We thus constructed low-dimensional latent feature spaces of the human genome, by utilizing three dimensionality reduction approaches and a diverse set of curated datasets. We applied standard Principal Component Analysis (PCA), kernel PCA and Autoencoder Neural Networks on 1360 datasets from four different measurement technologies. The latent feature spaces are tested for their ability to (a) reconstruct the original data and (b) improve predictive performance on validation datasets not used during the creation of the feature space. While linear techniques show better reconstruction performance, nonlinear approaches, particularly, neural-based models seem to be able to capture non-additive interaction effects, and thus enjoy stronger predictive capabilities. Despite the limited sample size of each dataset and the biological / technological heterogeneity across studies, our results show that low dimensional representations of the human transcriptome can be achieved by integrating hundreds of datasets. The created space is two to three orders of magnitude smaller compared to the raw data, offering the ability of capturing a large portion of the original data variability and eventually reducing computational time for downstream analyses.

Links and resources

BibTeX key: pantazis2020latent
entry type: article
year: 2020
journal: IEEE
DOI: 10.1109/BIBM49941.2020.9313286
url: https://ieeexplore.ieee.org/document/9313286

@mensxmachina's tags highlighted

Cite this publication

@article{pantazis2020latent, abstract = {High-throughput technologies such as microarrays and RNA-sequencing (RNA-seq) allow to precisely quantify transcriptomic profiles, generating datasets that are inevitably high-dimensional. In this work, we investigate whether the whole human transcriptome can be represented in a compressed, low dimensional latent space without loosing relevant information. We thus constructed low-dimensional latent feature spaces of the human genome, by utilizing three dimensionality reduction approaches and a diverse set of curated datasets. We applied standard Principal Component Analysis (PCA), kernel PCA and Autoencoder Neural Networks on 1360 datasets from four different measurement technologies. The latent feature spaces are tested for their ability to (a) reconstruct the original data and (b) improve predictive performance on validation datasets not used during the creation of the feature space. While linear techniques show better reconstruction performance, nonlinear approaches, particularly, neural-based models seem to be able to capture non-additive interaction effects, and thus enjoy stronger predictive capabilities. Despite the limited sample size of each dataset and the biological / technological heterogeneity across studies, our results show that low dimensional representations of the human transcriptome can be achieved by integrating hundreds of datasets. The created space is two to three orders of magnitude smaller compared to the raw data, offering the ability of capturing a large portion of the original data variability and eventually reducing computational time for downstream analyses.}, added-at = {2021-01-27T08:25:38.000+0100}, author = {Pantazis, Yannis and Tselas, Christos and Lakiotaki, Kleanthi and Lagani, Vincenzo and Tsamardinos, ioannis}, biburl = {https://www.bibsonomy.org/bibtex/22e00727d34af38370524ab45428d1935/mensxmachina}, doi = {10.1109/BIBM49941.2020.9313286}, interhash = {85456c0fc077102f3eca5cd7f7dfc749}, intrahash = {2e00727d34af38370524ab45428d1935}, journal = {IEEE}, keywords = {mxmcausalpath}, timestamp = {2021-03-08T12:07:50.000+0100}, title = {Latent Feature Representations for Human Gene Expression Data Improve Phenotypic Predictions}, url = {https://ieeexplore.ieee.org/document/9313286}, year = 2020 }

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Latent Feature Representations for Human Gene Expression Data Improve Phenotypic Predictions

Abstract

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Latent Feature Representations for Human Gene Expression Data Improve Phenotypic Predictions

Abstract

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Latent Feature Representations for Human Gene Expression Data Improve Phenotypic Predictions

Comments and Reviews
(0)