copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Creating hierarchical models of protein families based on Expressed Sequence Tags: the "Sprockets" analysis pipeline.

P. Gordon, C. Weinel, C. Jacobi, U. Kampf, E. Kriventseva, and C. Sensen. Anal Chim Acta, 564 (1): 123-32 (2006)

Abstract

We have created an analysis pipeline called Sprockets, which can be used to classify proteins into various hierarchical "families", and build searchable models of these families. The construction of these families is based on data from Expressed Sequence Tags (ESTs) and Coding DNA Sequences (CDSs), making Sprockets clusters especially suitable for studying gene families in organisms for which the completely sequenced genome does not (yet) exist. The pipeline consists of two main parts: pair-wise analysis and grouping of sequences with Z-score statistics, followed by hierarchical splitting of clusters into alignable protein families. Various computational and statistical techniques applied in Sprockets allow it to act like a massive and selective multiple sequence alignment engine for combining individual sequence collections and related public sequences. The end result is a database of gene Hidden Markov Models, each related to the other by three levels of similarity: secondary structure, function and evolutionary origin. For a sample 20,000 EST set from Lactuca spp., Sprockets provided a 9% improvement in mapping of function to unknown sequences over traditional pair-wise search methods and InterPro mapping.

Links and resources

BibTeX key: AnalChimActa.:564:123
entry type: article
year: 2006
journal: Anal Chim Acta
number: 1
pages: 123-32
volume: 564

Cite this publication

@article{AnalChimActa.:564:123, abstract = {We have created an analysis pipeline called Sprockets, which can be used to classify proteins into various hierarchical "families", and build searchable models of these families. The construction of these families is based on data from Expressed Sequence Tags (ESTs) and Coding DNA Sequences (CDSs), making Sprockets clusters especially suitable for studying gene families in organisms for which the completely sequenced genome does not (yet) exist. The pipeline consists of two main parts: pair-wise analysis and grouping of sequences with Z-score statistics, followed by hierarchical splitting of clusters into alignable protein families. Various computational and statistical techniques applied in Sprockets allow it to act like a massive and selective multiple sequence alignment engine for combining individual sequence collections and related public sequences. The end result is a database of gene Hidden Markov Models, each related to the other by three levels of similarity: secondary structure, function and evolutionary origin. For a sample 20,000 EST set from Lactuca spp., Sprockets provided a 9% improvement in mapping of function to unknown sequences over traditional pair-wise search methods and InterPro mapping.}, added-at = {2009-08-20T18:20:15.000+0200}, author = {Gordon, Paul M K and Weinel, Christian and Jacobi, Carsten and Kampf, Udo and Kriventseva, Evgenia and Sensen, Christoph W}, biburl = {https://www.bibsonomy.org/bibtex/2839e148d029917518605871fc0481456/cjacobi}, interhash = {9463b0fc79e12cfdfa00c7af1b466738}, intrahash = {839e148d029917518605871fc0481456}, journal = {Anal Chim Acta}, keywords = {Bioinformatics Carsten Jacobi Publikation Science}, number = 1, pages = {123-32}, timestamp = {2010-02-07T22:39:05.000+0100}, title = {Creating hierarchical models of protein families based on {E}xpressed {S}equence {T}ags: the "{S}prockets" analysis pipeline.}, volume = 564, year = 2006 }

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Creating hierarchical models of protein families based on Expressed Sequence Tags: the "Sprockets" analysis pipeline.

Abstract

Links and resources

Tags

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Creating hierarchical models of protein families based on Expressed Sequence Tags: the "Sprockets" analysis pipeline.

Abstract

Links and resources

Tags

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Creating hierarchical models of protein families based on Expressed Sequence Tags: the "Sprockets" analysis pipeline.

Comments and Reviews
(0)