copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

The Construction and Use of Log-Odds Substitution Scores for Multiple Sequence Alignment

S. Altschul, J. Wootton, E. Zaslavsky, and Y. Yu. PLoS Comput Biol, 6 (7): e1000852+ (Jul 15, 2010)
DOI: 10.1371/journal.pcbi.1000852

Abstract

Most pairwise and multiple sequence alignment programs seek alignments with optimal scores. Central to defining such scores is selecting a set of substitution scores for aligned amino acids or nucleotides. For local pairwise alignment, substitution scores are implicitly of log-odds form. We now extend the log-odds formalism to multiple alignments, using Bayesian methods to construct ” BILD” ( ” Bayesian Integral Log-odds”) substitution scores from prior distributions describing columns of related letters. This approach has been used previously only to define scores for aligning individual sequences to sequence profiles, but it has much broader applicability. We describe how to calculate BILD scores efficiently, and illustrate their uses in Gibbs sampling optimization procedures, gapped alignment, and the construction of hidden Markov model profiles. BILD scores enable automated selection of optimal motif and domain model widths, and can inform the decision of whether to include a sequence in a multiple alignment, and the selection of insertion and deletion locations. Other applications include the classification of related sequences into subfamilies, and the definition of profile-profile alignment scores. Although a fully realized multiple alignment program must rely upon more than substitution scores, many existing multiple alignment programs can be modified to employ BILD scores. We illustrate how simple BILD score based strategies can enhance the recognition of DNA binding domains, including the Api-AP2 domain in Toxoplasma gondii and Plasmodium falciparum. Multiple sequence alignment is a fundamental tool of biological research, widely used to identify important regions of DNA or protein molecules, to infer their biological functions, to reconstruct ancestries, and in numerous other applications. The effectiveness and accuracy of sequence comparison programs depends crucially upon the quality of the scoring systems they use to measure sequence similarity. To compare pairs of DNA or protein sequences, the best strategy for constructing similarity measures has long been understood, but there has been a lack of consensus about how to measure similarity among multiple (i.e. more than two) sequences. In this paper, we describe a natural generalization to multiple alignment of the accepted measure of pairwise similarity. A large variety of methods that are used to compare and analyze DNA or protein molecules, or to model protein domain families, could be rendered more sensitive and precise by adopting this similarity measure. We illustrate how our measure can enhance the recognition of important DNA binding domains.

Links and resources

BibTeX key: Altschul2010Construction
entry type: article
year: 2010
month: jul
day: 15
journal: PLoS Comput Biol
number: 7
pages: e1000852+
publisher: Public Library of Science
volume: 6
citeulike-article-id: 7501298
priority: 2
posted-at: 2010-07-19 10:35:43
citeulike-linkout-0: http://dx.doi.org/10.1371/journal.pcbi.1000852
DOI: 10.1371/journal.pcbi.1000852
url: http://dx.doi.org/10.1371/journal.pcbi.1000852

@karthikraman's tags highlighted

Cite this publication

%0 Journal Article %1 Altschul2010Construction %A Altschul, Stephen F. %A Wootton, John C. %A Zaslavsky, Elena %A Yu, Yi-Kuo %D 2010 %I Public Library of Science %J PLoS Comput Biol %K bioinformatics multiple-sequence-alignment sequence-alignment sequence-analysis %N 7 %P e1000852+ %R 10.1371/journal.pcbi.1000852 %T The Construction and Use of Log-Odds Substitution Scores for Multiple Sequence Alignment %U http://dx.doi.org/10.1371/journal.pcbi.1000852 %V 6 %X Most pairwise and multiple sequence alignment programs seek alignments with optimal scores. Central to defining such scores is selecting a set of substitution scores for aligned amino acids or nucleotides. For local pairwise alignment, substitution scores are implicitly of log-odds form. We now extend the log-odds formalism to multiple alignments, using Bayesian methods to construct ” BILD” ( ” Bayesian Integral Log-odds”) substitution scores from prior distributions describing columns of related letters. This approach has been used previously only to define scores for aligning individual sequences to sequence profiles, but it has much broader applicability. We describe how to calculate BILD scores efficiently, and illustrate their uses in Gibbs sampling optimization procedures, gapped alignment, and the construction of hidden Markov model profiles. BILD scores enable automated selection of optimal motif and domain model widths, and can inform the decision of whether to include a sequence in a multiple alignment, and the selection of insertion and deletion locations. Other applications include the classification of related sequences into subfamilies, and the definition of profile-profile alignment scores. Although a fully realized multiple alignment program must rely upon more than substitution scores, many existing multiple alignment programs can be modified to employ BILD scores. We illustrate how simple BILD score based strategies can enhance the recognition of DNA binding domains, including the Api-AP2 domain in Toxoplasma gondii and Plasmodium falciparum. Multiple sequence alignment is a fundamental tool of biological research, widely used to identify important regions of DNA or protein molecules, to infer their biological functions, to reconstruct ancestries, and in numerous other applications. The effectiveness and accuracy of sequence comparison programs depends crucially upon the quality of the scoring systems they use to measure sequence similarity. To compare pairs of DNA or protein sequences, the best strategy for constructing similarity measures has long been understood, but there has been a lack of consensus about how to measure similarity among multiple (i.e. more than two) sequences. In this paper, we describe a natural generalization to multiple alignment of the accepted measure of pairwise similarity. A large variety of methods that are used to compare and analyze DNA or protein molecules, or to model protein domain families, could be rendered more sensitive and precise by adopting this similarity measure. We illustrate how our measure can enhance the recognition of important DNA binding domains.

@article{Altschul2010Construction, abstract = {Most pairwise and multiple sequence alignment programs seek alignments with optimal scores. Central to defining such scores is selecting a set of substitution scores for aligned amino acids or nucleotides. For local pairwise alignment, substitution scores are implicitly of log-odds form. We now extend the log-odds formalism to multiple alignments, using Bayesian methods to construct ” {BILD}” ( ” Bayesian Integral Log-odds”) substitution scores from prior distributions describing columns of related letters. This approach has been used previously only to define scores for aligning individual sequences to sequence profiles, but it has much broader applicability. We describe how to calculate {BILD} scores efficiently, and illustrate their uses in Gibbs sampling optimization procedures, gapped alignment, and the construction of hidden Markov model profiles. {BILD} scores enable automated selection of optimal motif and domain model widths, and can inform the decision of whether to include a sequence in a multiple alignment, and the selection of insertion and deletion locations. Other applications include the classification of related sequences into subfamilies, and the definition of profile-profile alignment scores. Although a fully realized multiple alignment program must rely upon more than substitution scores, many existing multiple alignment programs can be modified to employ {BILD} scores. We illustrate how simple {BILD} score based strategies can enhance the recognition of {DNA} binding domains, including the {Api-AP2} domain in Toxoplasma gondii and Plasmodium falciparum. Multiple sequence alignment is a fundamental tool of biological research, widely used to identify important regions of {DNA} or protein molecules, to infer their biological functions, to reconstruct ancestries, and in numerous other applications. The effectiveness and accuracy of sequence comparison programs depends crucially upon the quality of the scoring systems they use to measure sequence similarity. To compare pairs of {DNA} or protein sequences, the best strategy for constructing similarity measures has long been understood, but there has been a lack of consensus about how to measure similarity among multiple (i.e. more than two) sequences. In this paper, we describe a natural generalization to multiple alignment of the accepted measure of pairwise similarity. A large variety of methods that are used to compare and analyze {DNA} or protein molecules, or to model protein domain families, could be rendered more sensitive and precise by adopting this similarity measure. We illustrate how our measure can enhance the recognition of important {DNA} binding domains.}, added-at = {2018-12-02T16:09:07.000+0100}, author = {Altschul, Stephen F. and Wootton, John C. and Zaslavsky, Elena and Yu, Yi-Kuo}, biburl = {https://www.bibsonomy.org/bibtex/2ec1db0d482ba359e081be73ae8d9c235/karthikraman}, citeulike-article-id = {7501298}, citeulike-linkout-0 = {http://dx.doi.org/10.1371/journal.pcbi.1000852}, day = 15, doi = {10.1371/journal.pcbi.1000852}, interhash = {988333f1e390e4357324be53aed7cf5a}, intrahash = {ec1db0d482ba359e081be73ae8d9c235}, journal = {PLoS Comput Biol}, keywords = {bioinformatics multiple-sequence-alignment sequence-alignment sequence-analysis}, month = jul, number = 7, pages = {e1000852+}, posted-at = {2010-07-19 10:35:43}, priority = {2}, publisher = {Public Library of Science}, timestamp = {2018-12-02T16:09:07.000+0100}, title = {The Construction and Use of {Log-Odds} Substitution Scores for Multiple Sequence Alignment}, url = {http://dx.doi.org/10.1371/journal.pcbi.1000852}, volume = 6, year = 2010 }

BibSonomy

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

The Construction and Use of Log-Odds Substitution Scores for Multiple Sequence Alignment

Abstract

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews
(0)

BibSonomy

copydeleteadd this publication to your clipboardcommunity posthistory of this postURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML The Construction and Use of Log-Odds Substitution Scores for Multiple Sequence Alignment

Abstract

Links and resources

Tags

community

Cite this publication

More citation styles

search on

Meta data

Comments and Reviews (0)

copy delete add this publication to your clipboard
community post
history of this post
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

The Construction and Use of Log-Odds Substitution Scores for Multiple Sequence Alignment

Comments and Reviews
(0)