copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Missing genes in the annotation of prokaryotic genomes

A. Warren, J. Archuleta, W. Feng, and J. Setubal. BMC Bioinformatics, 11 (1): 131+ (Mar 15, 2010)
DOI: 10.1186/1471-2105-11-131

Abstract

BACKGROUND:Protein-coding gene detection in prokaryotic genomes is considered a much simpler problem than in intron-containing eukaryotic genomes. However there have been reports that prokaryotic gene finder programs have problems with small genes (either over-predicting or under-predicting). Therefore the question arises as to whether current genome annotations have systematically missing, small genes.RESULTS:We have developed a high-performance computing methodology to investigate this problem. In this methodology we compare all ORFs larger than or equal to 33 aa from all fully-sequenced prokaryotic replicons. Based on that comparison, and using conservative criteria requiring a minimum taxonomic diversity between conserved ORFs in different genomes, we have discovered 1,153 candidate genes that are missing from current genome annotations. These missing genes are similar only to each other and do not have any strong similarity to gene sequences in public databases, with the implication that these ORFs belong to missing gene families. We also uncovered 38,895 intergenic ORFs, readily identified as putative genes by similarity to currently annotated genes (we call these absent annotations). The vast majority of the missing genes found are small (less than 100 aa). A comparison of select examples with GeneMark, EasyGene and Glimmer predictions yields evidence that some of these genes are escaping detection by these programs.CONCLUSIONS:Prokaryotic gene finders and prokaryotic genome annotations require improvement for accurate prediction of small genes. The number of missing gene families found is likely a lower bound on the actual number, due to the conservative criteria used to determine whether an ORF corresponds to a real gene.

Links and resources

BibTeX key: Warren2010Missing
entry type: article
year: 2010
month: mar
day: 15
journal: BMC Bioinformatics
number: 1
pages: 131+
volume: 11
citeulike-article-id: 6856068
citeulike-linkout-2: http://www.hubmed.org/display.cgi?uids=20230630
citeulike-linkout-1: http://view.ncbi.nlm.nih.gov/pubmed/20230630
pmid: 20230630
priority: 2
posted-at: 2010-04-14 12:37:41
issn: 1471-2105
citeulike-linkout-0: http://dx.doi.org/10.1186/1471-2105-11-131
DOI: 10.1186/1471-2105-11-131
url: http://dx.doi.org/10.1186/1471-2105-11-131

@karthikraman's tags highlighted

Cite this publication

%0 Journal Article %1 Warren2010Missing %A Warren, Andrew %A Archuleta, Jeremy %A Feng, Wu C. %A Setubal, Joao %D 2010 %J BMC Bioinformatics %K annotation high-performance-computing %N 1 %P 131+ %R 10.1186/1471-2105-11-131 %T Missing genes in the annotation of prokaryotic genomes %U http://dx.doi.org/10.1186/1471-2105-11-131 %V 11 %X BACKGROUND:Protein-coding gene detection in prokaryotic genomes is considered a much simpler problem than in intron-containing eukaryotic genomes. However there have been reports that prokaryotic gene finder programs have problems with small genes (either over-predicting or under-predicting). Therefore the question arises as to whether current genome annotations have systematically missing, small genes.RESULTS:We have developed a high-performance computing methodology to investigate this problem. In this methodology we compare all ORFs larger than or equal to 33 aa from all fully-sequenced prokaryotic replicons. Based on that comparison, and using conservative criteria requiring a minimum taxonomic diversity between conserved ORFs in different genomes, we have discovered 1,153 candidate genes that are missing from current genome annotations. These missing genes are similar only to each other and do not have any strong similarity to gene sequences in public databases, with the implication that these ORFs belong to missing gene families. We also uncovered 38,895 intergenic ORFs, readily identified as putative genes by similarity to currently annotated genes (we call these absent annotations). The vast majority of the missing genes found are small (less than 100 aa). A comparison of select examples with GeneMark, EasyGene and Glimmer predictions yields evidence that some of these genes are escaping detection by these programs.CONCLUSIONS:Prokaryotic gene finders and prokaryotic genome annotations require improvement for accurate prediction of small genes. The number of missing gene families found is likely a lower bound on the actual number, due to the conservative criteria used to determine whether an ORF corresponds to a real gene.

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Missing genes in the annotation of prokaryotic genomes

Abstract

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Missing genes in the annotation of prokaryotic genomes

Abstract

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Missing genes in the annotation of prokaryotic genomes

Comments and Reviews
(0)