Article,

Searching for discrimination rules in protease proteolytic cleavage activity using genetic programming with a min-max scoring function

Z. Yang, R. Thomson, T. Hodgman, J. Dry, A. Doyle, A. Narayanan, and X. Wu.
Biosystems, 72 (1-2): 159--176 (November 2003)
DOI: doi:10.1016/S0303-2647(03)00141-2

Abstract

We present an algorithm which is able to extract discriminant rules from oligopeptides for protease proteolytic cleavage activity prediction. The algorithm is developed using previous genetic programming. Three important components in the algorithm are a min-max scoring function, the reverse Polish notation (RPN) and the use of minimum description length. The min-max scoring function is developed using amino acid similarity matrices for measuring the similarity between an oligopeptide and a rule, which is a complex algebraic equation of amino acids rather than a simple pattern sequence. The Fisher ratio is then calculated on the scoring values using the class label associated with the oligopeptides. The discriminant ability of each rule can therefore be evaluated. The use of RPN makes the evolutionary operations simpler and therefore reduces the computational cost. To prevent overfitting, the concept of minimum description length is used to penalize over-complicated rules. A fitness function is therefore composed of the Fisher ratio and the use of minimum description length for an efficient evolutionary process. In the application to four protease datasets (Trypsin, Factor Xa, Hepatitis C Virus and HIV protease cleavage site prediction), our algorithm is superior to C5, a conventional method for deriving decision trees.

BibTeX key: ZhengRongYang:2003:BS
entry type: article
year: 2003
month: November
journal: Biosystems
number: 1-2
pages: 159--176
volume: 72
DOI: doi:10.1016/S0303-2647(03)00141-2
url: http://www.sciencedirect.com/science/article/B6T2K-49N9DN6-2/2/0d63ebb7904ac33ae0d20ce4f6477a57

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

%0 Journal Article %1 ZhengRongYang:2003:BS %A Yang, Zheng Rong %A Thomson, Rebecca %A Hodgman, T. Charles %A Dry, Jonathan %A Doyle, Austin K. %A Narayanan, Ajit %A Wu, XiKun %D 2003 %J Biosystems %K Amino Polish Proteolytic The acid algorithms, analysis cleavage genetic matrix, notation, programming, reverse similarity %N 1-2 %P 159--176 %R doi:10.1016/S0303-2647(03)00141-2 %T Searching for discrimination rules in protease proteolytic cleavage activity using genetic programming with a min-max scoring function %U http://www.sciencedirect.com/science/article/B6T2K-49N9DN6-2/2/0d63ebb7904ac33ae0d20ce4f6477a57 %V 72 %X We present an algorithm which is able to extract discriminant rules from oligopeptides for protease proteolytic cleavage activity prediction. The algorithm is developed using previous genetic programming. Three important components in the algorithm are a min-max scoring function, the reverse Polish notation (RPN) and the use of minimum description length. The min-max scoring function is developed using amino acid similarity matrices for measuring the similarity between an oligopeptide and a rule, which is a complex algebraic equation of amino acids rather than a simple pattern sequence. The Fisher ratio is then calculated on the scoring values using the class label associated with the oligopeptides. The discriminant ability of each rule can therefore be evaluated. The use of RPN makes the evolutionary operations simpler and therefore reduces the computational cost. To prevent overfitting, the concept of minimum description length is used to penalize over-complicated rules. A fitness function is therefore composed of the Fisher ratio and the use of minimum description length for an efficient evolutionary process. In the application to four protease datasets (Trypsin, Factor Xa, Hepatitis C Virus and HIV protease cleavage site prediction), our algorithm is superior to C5, a conventional method for deriving decision trees.

@article{ZhengRongYang:2003:BS, abstract = {We present an algorithm which is able to extract discriminant rules from oligopeptides for protease proteolytic cleavage activity prediction. The algorithm is developed using previous genetic programming. Three important components in the algorithm are a min-max scoring function, the reverse Polish notation (RPN) and the use of minimum description length. The min-max scoring function is developed using amino acid similarity matrices for measuring the similarity between an oligopeptide and a rule, which is a complex algebraic equation of amino acids rather than a simple pattern sequence. The Fisher ratio is then calculated on the scoring values using the class label associated with the oligopeptides. The discriminant ability of each rule can therefore be evaluated. The use of RPN makes the evolutionary operations simpler and therefore reduces the computational cost. To prevent overfitting, the concept of minimum description length is used to penalize over-complicated rules. A fitness function is therefore composed of the Fisher ratio and the use of minimum description length for an efficient evolutionary process. In the application to four protease datasets (Trypsin, Factor Xa, Hepatitis C Virus and HIV protease cleavage site prediction), our algorithm is superior to C5, a conventional method for deriving decision trees.}, added-at = {2008-06-19T17:35:00.000+0200}, author = {Yang, Zheng Rong and Thomson, Rebecca and Hodgman, T. Charles and Dry, Jonathan and Doyle, Austin K. and Narayanan, Ajit and Wu, XiKun}, biburl = {https://www.bibsonomy.org/bibtex/265ac931186e0c5a01933979ff40860d7/brazovayeye}, doi = {doi:10.1016/S0303-2647(03)00141-2}, interhash = {9a31690413ed226b1d0089576211e51c}, intrahash = {65ac931186e0c5a01933979ff40860d7}, journal = {Biosystems}, keywords = {Amino Polish Proteolytic The acid algorithms, analysis cleavage genetic matrix, notation, programming, reverse similarity}, month = {November}, number = {1-2}, pages = {159--176}, timestamp = {2008-06-19T17:54:48.000+0200}, title = {Searching for discrimination rules in protease proteolytic cleavage activity using genetic programming with a min-max scoring function}, url = {http://www.sciencedirect.com/science/article/B6T2K-49N9DN6-2/2/0d63ebb7904ac33ae0d20ce4f6477a57}, volume = 72, year = 2003 }

BibSonomy

Searching for discrimination rules in protease proteolytic cleavage activity using genetic programming with a min-max scoring function

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on