Kopieren Löschen Diese Publikation zur Ablage hinzufügen
Community-Eintrag
Versionsverlauf dieses Eintrags
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Automatic discovery of cross-family sequence features associated with protein function

M. Brameier, J. Haan, A. Krings, und R. MacCallum. BMC bioinformatics electronic resource, (Januar 2006)
DOI: doi:10.1186/1471-2105-7-16

Zusammenfassung

Background Methods for predicting protein function directly from amino acid sequences are useful tools in the study of uncharacterised protein families and in comparative genomics. Until now, this problem has been approached using machine learning techniques that attempt to predict membership, or otherwise, to predefined functional categories or subcellular locations. A potential drawback of this approach is that the human-designated functional classes may not accurately reflect the underlying biology, and consequently important sequence-to-function relationships may be missed. Results We show that a self-supervised data mining approach is able to find relationships between sequence features and functional annotations. No preconceived ideas about functional categories are required, and the training data is simply a set of protein sequences and their UniProt/Swiss-Prot annotations. The main technical aspect of the approach is the co-evolution of amino acid-based regular expressions and keyword-based logical expressions with genetic programming. Our experiments on a strictly non-redundant set of eukaryotic proteins reveal that the strongest and most easily detected sequence-to-function relationships are concerned with targeting to various cellular compartments, which is an area already well studied both experimentally and computationally. Of more interest are a number of broad functional roles which can also be correlated with sequence features. These include inhibition, biosynthesis, transcription and defence against bacteria. Despite substantial overlaps between these functions and their corresponding cellular compartments, we find clear differences in the sequence motifs used to predict some of these functions. For example, the presence of polyglutamine repeats appears to be linked more strongly to the "transcription" function than to the general "nuclear" function/location. Conclusion We have developed a novel and useful approach for knowledge discovery in annotated sequence data. The technique is able to identify functionally important sequence features and does not require expert knowledge. By viewing protein function from a sequence perspective, the approach is also suitable for discovering unexpected links between biological processes, such as the recently discovered role of ubiquitination in transcription.

Links und Ressourcen

BibTeX-Schlüssel: oai:biomedcentral.com:1471-2105-7-16
Eintragstyp: article
Jahr: 2006
Monat: January~12
Zeitschrift: BMC bioinformatics electronic resource
Nummer: 16
Verlag: BioMed Central Ltd.
Band: 7
issn: 1471-2105
bibsource: OAI-PMH server at www.biomedcentral.com
rights: Copyright 2006 Brameier et al; licensee BioMed Central Ltd.
size: 16 pages
oai: oai:biomedcentral.com:1471-2105-7-16
language: en
notes: PMID: 16409628
DOI: doi:10.1186/1471-2105-7-16
URL: http://www.biomedcentral.com/1471-2105/7/16

Zitieren Sie diese Publikation

%0 Journal Article %1 oai:biomedcentral.com:1471-2105-7-16 %A Brameier, Markus %A Haan, Josien %A Krings, Andrea %A MacCallum, Robert M %D 2006 %I BioMed Central Ltd. %J BMC bioinformatics electronic resource %K algorithms, genetic programming %N 16 %R doi:10.1186/1471-2105-7-16 %T Automatic discovery of cross-family sequence features associated with protein function %U http://www.biomedcentral.com/1471-2105/7/16 %V 7 %X Background Methods for predicting protein function directly from amino acid sequences are useful tools in the study of uncharacterised protein families and in comparative genomics. Until now, this problem has been approached using machine learning techniques that attempt to predict membership, or otherwise, to predefined functional categories or subcellular locations. A potential drawback of this approach is that the human-designated functional classes may not accurately reflect the underlying biology, and consequently important sequence-to-function relationships may be missed. Results We show that a self-supervised data mining approach is able to find relationships between sequence features and functional annotations. No preconceived ideas about functional categories are required, and the training data is simply a set of protein sequences and their UniProt/Swiss-Prot annotations. The main technical aspect of the approach is the co-evolution of amino acid-based regular expressions and keyword-based logical expressions with genetic programming. Our experiments on a strictly non-redundant set of eukaryotic proteins reveal that the strongest and most easily detected sequence-to-function relationships are concerned with targeting to various cellular compartments, which is an area already well studied both experimentally and computationally. Of more interest are a number of broad functional roles which can also be correlated with sequence features. These include inhibition, biosynthesis, transcription and defence against bacteria. Despite substantial overlaps between these functions and their corresponding cellular compartments, we find clear differences in the sequence motifs used to predict some of these functions. For example, the presence of polyglutamine repeats appears to be linked more strongly to the "transcription" function than to the general "nuclear" function/location. Conclusion We have developed a novel and useful approach for knowledge discovery in annotated sequence data. The technique is able to identify functionally important sequence features and does not require expert knowledge. By viewing protein function from a sequence perspective, the approach is also suitable for discovering unexpected links between biological processes, such as the recently discovered role of ubiquitination in transcription.

@article{oai:biomedcentral.com:1471-2105-7-16, abstract = {Background Methods for predicting protein function directly from amino acid sequences are useful tools in the study of uncharacterised protein families and in comparative genomics. Until now, this problem has been approached using machine learning techniques that attempt to predict membership, or otherwise, to predefined functional categories or subcellular locations. A potential drawback of this approach is that the human-designated functional classes may not accurately reflect the underlying biology, and consequently important sequence-to-function relationships may be missed. Results We show that a self-supervised data mining approach is able to find relationships between sequence features and functional annotations. No preconceived ideas about functional categories are required, and the training data is simply a set of protein sequences and their UniProt/Swiss-Prot annotations. The main technical aspect of the approach is the co-evolution of amino acid-based regular expressions and keyword-based logical expressions with genetic programming. Our experiments on a strictly non-redundant set of eukaryotic proteins reveal that the strongest and most easily detected sequence-to-function relationships are concerned with targeting to various cellular compartments, which is an area already well studied both experimentally and computationally. Of more interest are a number of broad functional roles which can also be correlated with sequence features. These include inhibition, biosynthesis, transcription and defence against bacteria. Despite substantial overlaps between these functions and their corresponding cellular compartments, we find clear differences in the sequence motifs used to predict some of these functions. For example, the presence of polyglutamine repeats appears to be linked more strongly to the {"}transcription{"} function than to the general {"}nuclear{"} function/location. Conclusion We have developed a novel and useful approach for knowledge discovery in annotated sequence data. The technique is able to identify functionally important sequence features and does not require expert knowledge. By viewing protein function from a sequence perspective, the approach is also suitable for discovering unexpected links between biological processes, such as the recently discovered role of ubiquitination in transcription.}, added-at = {2008-06-19T17:35:00.000+0200}, author = {Brameier, Markus and Haan, Josien and Krings, Andrea and MacCallum, Robert M}, bibsource = {OAI-PMH server at www.biomedcentral.com}, biburl = {https://www.bibsonomy.org/bibtex/2dce235c02e5f81ec75d14988f43df44c/brazovayeye}, doi = {doi:10.1186/1471-2105-7-16}, interhash = {f43b14221f6198fe1245a86fccdd734b}, intrahash = {dce235c02e5f81ec75d14988f43df44c}, issn = {1471-2105}, journal = {BMC bioinformatics [electronic resource]}, keywords = {algorithms, genetic programming}, language = {en}, month = {January~12}, notes = {PMID: 16409628}, number = 16, oai = {oai:biomedcentral.com:1471-2105-7-16}, publisher = {BioMed Central Ltd.}, rights = {Copyright 2006 Brameier et al; licensee BioMed Central Ltd.}, size = {16 pages}, timestamp = {2008-06-19T17:36:55.000+0200}, title = {Automatic discovery of cross-family sequence features associated with protein function}, url = {http://www.biomedcentral.com/1471-2105/7/16}, volume = 7, year = 2006 }

BibSonomy

Kopieren Löschen Diese Publikation zur Ablage hinzufügen
Community-Eintrag
Versionsverlauf dieses Eintrags
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Automatic discovery of cross-family sequence features associated with protein function

Zusammenfassung

Links und Ressourcen

Tags

Zitieren Sie diese Publikation

Mehr Zitationsstile

Suchen auf

Metadaten

Kommentare und Rezensionen
(0)

BibSonomy

KopierenLöschenDiese Publikation zur Ablage hinzufügenCommunity-EintragVersionsverlauf dieses EintragsURLDOIBibTeXEndNoteAPAChicagoDIN 1505HarvardMSOffice XML Automatic discovery of cross-family sequence features associated with protein function

Zusammenfassung

Links und Ressourcen

Tags

Zitieren Sie diese Publikation

Mehr Zitationsstile

Suchen auf

Metadaten

Kommentare und Rezensionen (0)

Kopieren Löschen Diese Publikation zur Ablage hinzufügen
Community-Eintrag
Versionsverlauf dieses Eintrags
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Automatic discovery of cross-family sequence features associated with protein function

Kommentare und Rezensionen
(0)