Article,

Methods for Assessing the Statistical Significance of Molecular Sequence Features by Using General Scoring Schemes

S. Karlin, and S. Altschul.
Proceedings of the National Academy of Sciences of the United States of America, 87 (6): pp. 2264-2268 (1990)

Abstract

An unusual pattern in a nucleic acid or protein sequence or a region of strong similarity shared by two or more sequences may have biological significance. It is therefore desirable to know whether such a pattern can have arisen simply by chance. To identify interesting sequence patterns, appropriate scoring values can be assigned to the individual residues of a single sequence or to sets of residues when several sequences are compared. For single sequences, such scores can reflect biophysical properties such as charge, volume, hydrophobicity, or secondary structure potential; for multiple sequences, they can reflect nucleotide or amino acid similarity measured in a wide variety of ways. Using an appropriate random model, we present a theory that provides precise numerical formulas for assessing the statistical significance of any region with high aggregate score. A second class of results describes the composition of high-scoring segments. In certain contexts, these permit the choice of scoring systems which are öptimal" for distinguishing biologically relevant patterns. Examples are given of applications of the theory to a variety of protein sequences, highlighting segments with unusual biological features. These include distinctive charge regions in transcription factors and protooncogene products, pronounced hydrophobic segments in various receptor and transport proteins, and statistically significant subalignments involving the recently characterized cystic fibrosis gene.

BibTeX key: 1990
entry type: article
year: 1990
journal: Proceedings of the National Academy of Sciences of the United States of America
number: 6
pages: pp. 2264-2268
publisher: National Academy of Sciences
volume: 87
issn: 00278424
language: English
jstor_formatteddate: Mar., 1990
jstor_articletype: research-article
url: http://www.jstor.org/stable/2354031

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

@article{1990, abstract = {An unusual pattern in a nucleic acid or protein sequence or a region of strong similarity shared by two or more sequences may have biological significance. It is therefore desirable to know whether such a pattern can have arisen simply by chance. To identify interesting sequence patterns, appropriate scoring values can be assigned to the individual residues of a single sequence or to sets of residues when several sequences are compared. For single sequences, such scores can reflect biophysical properties such as charge, volume, hydrophobicity, or secondary structure potential; for multiple sequences, they can reflect nucleotide or amino acid similarity measured in a wide variety of ways. Using an appropriate random model, we present a theory that provides precise numerical formulas for assessing the statistical significance of any region with high aggregate score. A second class of results describes the composition of high-scoring segments. In certain contexts, these permit the choice of scoring systems which are "optimal" for distinguishing biologically relevant patterns. Examples are given of applications of the theory to a variety of protein sequences, highlighting segments with unusual biological features. These include distinctive charge regions in transcription factors and protooncogene products, pronounced hydrophobic segments in various receptor and transport proteins, and statistically significant subalignments involving the recently characterized cystic fibrosis gene.}, added-at = {2013-04-10T17:51:32.000+0200}, author = {Karlin, Samuel and Altschul, Stephen F.}, biburl = {https://www.bibsonomy.org/bibtex/281979ad257e129127a597bfc1959881f/ytyoun}, interhash = {e8ccb1d43fabbf76656a5138b25e7a2d}, intrahash = {81979ad257e129127a597bfc1959881f}, issn = {00278424}, journal = {Proceedings of the National Academy of Sciences of the United States of America}, jstor_articletype = {research-article}, jstor_formatteddate = {Mar., 1990}, keywords = {alignment bioinformatics smith-waterman}, language = {English}, number = 6, pages = {pp. 2264-2268}, publisher = {National Academy of Sciences}, timestamp = {2013-04-10T17:51:32.000+0200}, title = {Methods for Assessing the Statistical Significance of Molecular Sequence Features by Using General Scoring Schemes}, url = {http://www.jstor.org/stable/2354031}, volume = 87, year = 1990 }

BibSonomy

Methods for Assessing the Statistical Significance of Molecular Sequence Features by Using General Scoring Schemes

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on