Article,

Assembly free comparative genomics of short-read sequence data discovers the needles in the haystack

C. Cannon, C. Kua, D. Zhang, and J. Harting.
Molecular Ecology, (2010)
DOI: 10.1111/j.1365-294X.2009.04484.x

Abstract

Most comparative genomic analyses of short-read sequence (SRS) data rely upon the prior assembly of a reference sequence. Here, we present an assembly free analysis of SRS data that discovers sequence variants among focal genomes by tabulating the presence and frequency of ‘complex’ fragments in the data. Using data from nine tree species, we compare genomic diversity from populations to families. As a control, we simulated SRS data for three known plant genomes. The results provide insight into the quality and distributional bias of the sequencing reaction. Three main types of informative complexmers were identified, each possessing unique statistical properties. Type I complexmers are unique to a genome but suffer from a high false positive rate, being highly dependent on read coverage and distribution. Type II complexmers are shared between two genomes and can highlight potential copy-number differences. Type III complexmers are exclusive to a subset of genomes and can be useful for associating genetic differences with phenotypic or geographic variation. At the population level in an endangered timber species, numerous markers were identified that could potentially determine geographic origin of individuals and regulate international trade. We observed that the genomic data for the four fig species were more divergent than for stone oak species, possibly due to their complex pollination syndrome and high rates of gene flow. Our approach greatly enhances the application of SRS technology to the study of non-model organisms and directly identifies the most informative genetic elements for more detailed study and assembly.

BibTeX key: cannon2010assembly
entry type: article
year: 2010
journal: Molecular Ecology
pages: 147--161
publisher: Blackwell Publishing Ltd
volume: 19
issn: 1365-294X
DOI: 10.1111/j.1365-294X.2009.04484.x
url: http://dx.doi.org/10.1111/j.1365-294X.2009.04484.x

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

%0 Journal Article %1 cannon2010assembly %A Cannon, Charles H. %A Kua, Chai-Shian %A Zhang, D. %A Harting, J.r. %D 2010 %I Blackwell Publishing Ltd %J Molecular Ecology %K assembly-free phylogenetics %P 147--161 %R 10.1111/j.1365-294X.2009.04484.x %T Assembly free comparative genomics of short-read sequence data discovers the needles in the haystack %U http://dx.doi.org/10.1111/j.1365-294X.2009.04484.x %V 19 %X Most comparative genomic analyses of short-read sequence (SRS) data rely upon the prior assembly of a reference sequence. Here, we present an assembly free analysis of SRS data that discovers sequence variants among focal genomes by tabulating the presence and frequency of ‘complex’ fragments in the data. Using data from nine tree species, we compare genomic diversity from populations to families. As a control, we simulated SRS data for three known plant genomes. The results provide insight into the quality and distributional bias of the sequencing reaction. Three main types of informative complexmers were identified, each possessing unique statistical properties. Type I complexmers are unique to a genome but suffer from a high false positive rate, being highly dependent on read coverage and distribution. Type II complexmers are shared between two genomes and can highlight potential copy-number differences. Type III complexmers are exclusive to a subset of genomes and can be useful for associating genetic differences with phenotypic or geographic variation. At the population level in an endangered timber species, numerous markers were identified that could potentially determine geographic origin of individuals and regulate international trade. We observed that the genomic data for the four fig species were more divergent than for stone oak species, possibly due to their complex pollination syndrome and high rates of gene flow. Our approach greatly enhances the application of SRS technology to the study of non-model organisms and directly identifies the most informative genetic elements for more detailed study and assembly.

@article{cannon2010assembly, abstract = {Most comparative genomic analyses of short-read sequence (SRS) data rely upon the prior assembly of a reference sequence. Here, we present an assembly free analysis of SRS data that discovers sequence variants among focal genomes by tabulating the presence and frequency of ‘complex’ fragments in the data. Using data from nine tree species, we compare genomic diversity from populations to families. As a control, we simulated SRS data for three known plant genomes. The results provide insight into the quality and distributional bias of the sequencing reaction. Three main types of informative complexmers were identified, each possessing unique statistical properties. Type I complexmers are unique to a genome but suffer from a high false positive rate, being highly dependent on read coverage and distribution. Type II complexmers are shared between two genomes and can highlight potential copy-number differences. Type III complexmers are exclusive to a subset of genomes and can be useful for associating genetic differences with phenotypic or geographic variation. At the population level in an endangered timber species, numerous markers were identified that could potentially determine geographic origin of individuals and regulate international trade. We observed that the genomic data for the four fig species were more divergent than for stone oak species, possibly due to their complex pollination syndrome and high rates of gene flow. Our approach greatly enhances the application of SRS technology to the study of non-model organisms and directly identifies the most informative genetic elements for more detailed study and assembly.}, added-at = {2014-05-28T12:33:32.000+0200}, author = {Cannon, Charles H. and Kua, Chai-Shian and Zhang, D. and Harting, J.r.}, biburl = {https://www.bibsonomy.org/bibtex/25e74e20a856fd35edd971f4bffed816d/peter.ralph}, doi = {10.1111/j.1365-294X.2009.04484.x}, interhash = {3151e0027c1b7735f29b485bb799e798}, intrahash = {5e74e20a856fd35edd971f4bffed816d}, issn = {1365-294X}, journal = {Molecular Ecology}, keywords = {assembly-free phylogenetics}, pages = {147--161}, publisher = {Blackwell Publishing Ltd}, timestamp = {2014-05-28T12:33:32.000+0200}, title = {Assembly free comparative genomics of short-read sequence data discovers the needles in the haystack}, url = {http://dx.doi.org/10.1111/j.1365-294X.2009.04484.x}, volume = 19, year = 2010 }

BibSonomy

Assembly free comparative genomics of short-read sequence data discovers the needles in the haystack

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on