Transcriptome sequencing (RNA-Seq) has become the assay of choice for high-throughput studies of gene expression. However, as is the case with microarrays, major technology-related artifacts and biases affect the resulting expression measures. Normalization is therefore essential to ensure accurate inference of expression levels and subsequent analyses thereof.We focus on biases related to GC-content and demonstrate the existence of strong sample-specific GC-content effects on RNA-Seq read counts, which can substantially bias differential expression analysis. We propose three simple within-lane gene-level GC-content normalization approaches and assess their performance on two different RNA-Seq datasets, involving different species and experimental designs. Our methods are compared to state-of-the-art normalization procedures in terms of bias and mean squared error for expression fold-change estimation and in terms of Type I error and p-value distributions for tests of differential expression. The exploratory data analysis and normalization methods proposed in this article are implemented in the open-source Bioconductor R package EDASeq.Our within-lane normalization procedures, followed by between-lane normalization, reduce GC-content bias and lead to more accurate estimates of expression fold-changes and tests of differential expression. Such results are crucial for the biological interpretation of RNA-Seq experiments, where downstream analyses can be sensitive to the supplied lists of genes.
Описание
GC-content normalization for RNA-Seq data. - PubMed - NCBI
%0 Journal Article
%1 Risso:2011:BMC-Bioinformatics:22177264
%A Risso, D
%A Schwartz, K
%A Sherlock, G
%A Dudoit, S
%D 2011
%J BMC Bioinformatics
%K MUSTREAD fulltext quality-control rna-seq software
%P 480-480
%R 10.1186/1471-2105-12-480
%T GC-content normalization for RNA-Seq data
%U https://www.ncbi.nlm.nih.gov/pubmed/22177264?dopt=Abstract
%V 12
%X Transcriptome sequencing (RNA-Seq) has become the assay of choice for high-throughput studies of gene expression. However, as is the case with microarrays, major technology-related artifacts and biases affect the resulting expression measures. Normalization is therefore essential to ensure accurate inference of expression levels and subsequent analyses thereof.We focus on biases related to GC-content and demonstrate the existence of strong sample-specific GC-content effects on RNA-Seq read counts, which can substantially bias differential expression analysis. We propose three simple within-lane gene-level GC-content normalization approaches and assess their performance on two different RNA-Seq datasets, involving different species and experimental designs. Our methods are compared to state-of-the-art normalization procedures in terms of bias and mean squared error for expression fold-change estimation and in terms of Type I error and p-value distributions for tests of differential expression. The exploratory data analysis and normalization methods proposed in this article are implemented in the open-source Bioconductor R package EDASeq.Our within-lane normalization procedures, followed by between-lane normalization, reduce GC-content bias and lead to more accurate estimates of expression fold-changes and tests of differential expression. Such results are crucial for the biological interpretation of RNA-Seq experiments, where downstream analyses can be sensitive to the supplied lists of genes.
@article{Risso:2011:BMC-Bioinformatics:22177264,
abstract = {Transcriptome sequencing (RNA-Seq) has become the assay of choice for high-throughput studies of gene expression. However, as is the case with microarrays, major technology-related artifacts and biases affect the resulting expression measures. Normalization is therefore essential to ensure accurate inference of expression levels and subsequent analyses thereof.We focus on biases related to GC-content and demonstrate the existence of strong sample-specific GC-content effects on RNA-Seq read counts, which can substantially bias differential expression analysis. We propose three simple within-lane gene-level GC-content normalization approaches and assess their performance on two different RNA-Seq datasets, involving different species and experimental designs. Our methods are compared to state-of-the-art normalization procedures in terms of bias and mean squared error for expression fold-change estimation and in terms of Type I error and p-value distributions for tests of differential expression. The exploratory data analysis and normalization methods proposed in this article are implemented in the open-source Bioconductor R package EDASeq.Our within-lane normalization procedures, followed by between-lane normalization, reduce GC-content bias and lead to more accurate estimates of expression fold-changes and tests of differential expression. Such results are crucial for the biological interpretation of RNA-Seq experiments, where downstream analyses can be sensitive to the supplied lists of genes.},
added-at = {2017-09-30T11:18:26.000+0200},
author = {Risso, D and Schwartz, K and Sherlock, G and Dudoit, S},
biburl = {https://www.bibsonomy.org/bibtex/217815e9c3600bb9d3bd4e26a87743686/marcsaric},
description = {GC-content normalization for RNA-Seq data. - PubMed - NCBI},
doi = {10.1186/1471-2105-12-480},
interhash = {67ed5b6751cf46a85375eb129da0395e},
intrahash = {17815e9c3600bb9d3bd4e26a87743686},
journal = {BMC Bioinformatics},
keywords = {MUSTREAD fulltext quality-control rna-seq software},
month = dec,
pages = {480-480},
pmid = {22177264},
timestamp = {2017-09-30T11:18:26.000+0200},
title = {GC-content normalization for RNA-Seq data},
url = {https://www.ncbi.nlm.nih.gov/pubmed/22177264?dopt=Abstract},
volume = 12,
year = 2011
}