Article,

Population genomics based on low coverage sequencing: how low should we go?

A. Buerkle, and Z. Gompert.
Molecular Ecology, 22 (11): 3028--3035 (2013)
DOI: 10.1111/mec.12105

Abstract

Research in molecular ecology is now often based on large numbers of DNA sequence reads. Given a time and financial budget for DNA sequencing, the question arises as to how to allocate the finite number of sequence reads among three dimensions: (i) sequencing individual nucleotide positions repeatedly and achieving high confidence in the true genotype of individuals, (ii) sampling larger numbers of individuals from a population, and (iii) sampling a larger fraction of the genome. Leaving aside the question of what fraction of the genome to sample, we analyze the trade-off between repeatedly sequencing the same nucleotide position (coverage depth) and the number of individuals in the sample. We review simple Bayesian models for allele frequencies and utilize these in the analysis of how to obtain maximal information about population genetic parameters. The models indicate that sampling larger numbers of individuals, at the expense of coverage depth per nucleotide position, provides more information about population parameters. Dividing the sequencing effort maximally among individuals and obtaining approximately one read per locus and individual (1 × coverage) yields the most information about a population. Some analyses require genetic parameters for individuals, in which case Bayesian population models also support inference from lower coverage sequence data than are required for simple likelihood models. Low coverage sequencing is not only sufficient to support inference, but it is optimal to design studies to utilize low coverage because they will yield highly accurate and precise parameter estimates based on more individuals or sites in the genome.

BibTeX key: alexbuerkle2013population
entry type: article
year: 2013
journal: Molecular Ecology
number: 11
pages: 3028--3035
volume: 22
issn: 1365-294X
DOI: 10.1111/mec.12105
url: http://dx.doi.org/10.1111/mec.12105

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

%0 Journal Article %1 alexbuerkle2013population %A Buerkle, Alex C. %A Gompert, Zachariah %D 2013 %J Molecular Ecology %K coverage population_genomics sequencing_technology %N 11 %P 3028--3035 %R 10.1111/mec.12105 %T Population genomics based on low coverage sequencing: how low should we go? %U http://dx.doi.org/10.1111/mec.12105 %V 22 %X Research in molecular ecology is now often based on large numbers of DNA sequence reads. Given a time and financial budget for DNA sequencing, the question arises as to how to allocate the finite number of sequence reads among three dimensions: (i) sequencing individual nucleotide positions repeatedly and achieving high confidence in the true genotype of individuals, (ii) sampling larger numbers of individuals from a population, and (iii) sampling a larger fraction of the genome. Leaving aside the question of what fraction of the genome to sample, we analyze the trade-off between repeatedly sequencing the same nucleotide position (coverage depth) and the number of individuals in the sample. We review simple Bayesian models for allele frequencies and utilize these in the analysis of how to obtain maximal information about population genetic parameters. The models indicate that sampling larger numbers of individuals, at the expense of coverage depth per nucleotide position, provides more information about population parameters. Dividing the sequencing effort maximally among individuals and obtaining approximately one read per locus and individual (1 × coverage) yields the most information about a population. Some analyses require genetic parameters for individuals, in which case Bayesian population models also support inference from lower coverage sequence data than are required for simple likelihood models. Low coverage sequencing is not only sufficient to support inference, but it is optimal to design studies to utilize low coverage because they will yield highly accurate and precise parameter estimates based on more individuals or sites in the genome.

@article{alexbuerkle2013population, abstract = {Research in molecular ecology is now often based on large numbers of DNA sequence reads. Given a time and financial budget for DNA sequencing, the question arises as to how to allocate the finite number of sequence reads among three dimensions: (i) sequencing individual nucleotide positions repeatedly and achieving high confidence in the true genotype of individuals, (ii) sampling larger numbers of individuals from a population, and (iii) sampling a larger fraction of the genome. Leaving aside the question of what fraction of the genome to sample, we analyze the trade-off between repeatedly sequencing the same nucleotide position (coverage depth) and the number of individuals in the sample. We review simple Bayesian models for allele frequencies and utilize these in the analysis of how to obtain maximal information about population genetic parameters. The models indicate that sampling larger numbers of individuals, at the expense of coverage depth per nucleotide position, provides more information about population parameters. Dividing the sequencing effort maximally among individuals and obtaining approximately one read per locus and individual (1 × coverage) yields the most information about a population. Some analyses require genetic parameters for individuals, in which case Bayesian population models also support inference from lower coverage sequence data than are required for simple likelihood models. Low coverage sequencing is not only sufficient to support inference, but it is optimal to design studies to utilize low coverage because they will yield highly accurate and precise parameter estimates based on more individuals or sites in the genome.}, added-at = {2014-01-22T23:46:09.000+0100}, author = {Buerkle, Alex C. and Gompert, Zachariah}, biburl = {https://www.bibsonomy.org/bibtex/25c991321e469e0a1f1e4f032ca0378a5/peter.ralph}, doi = {10.1111/mec.12105}, interhash = {0ca8328659c64dcf00b7481bbe58ebb2}, intrahash = {5c991321e469e0a1f1e4f032ca0378a5}, issn = {1365-294X}, journal = {Molecular Ecology}, keywords = {coverage population_genomics sequencing_technology}, number = 11, pages = {3028--3035}, timestamp = {2014-01-22T23:46:09.000+0100}, title = {Population genomics based on low coverage sequencing: how low should we go?}, url = {http://dx.doi.org/10.1111/mec.12105}, volume = 22, year = 2013 }

BibSonomy

Population genomics based on low coverage sequencing: how low should we go?

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on