Abstract
Automated methods of machine learning may prove to be
useful in discovering biologically meaningful
information hidden in the rapidly growing databases of
DNA sequences and protein sequences. Genetic
programming is an extension of the genetic algorithm in
which a population of computer programs is bred, over a
series of generations, in order to solve a problem.
Genetic programming is capable of evolving complicated
problem-solving expressions of unspecified size and
shape. Moreover, when automatically defined functions
are added to genetic programming, genetic programming
becomes capable of efficiently capturing and exploiting
recurring sub-patterns. This chapter describes how
genetic programming with automatically defined
functions successfully evolved motifs for detecting the
D-E-A-D box family of proteins and for detecting the
manganese superoxide dismutase family. Both motifs were
evolved without prespecifying their length. Both
evolved motifs employed automatically defined functions
to capture the repeated use of common subexpressions.
When tested against the SWISS-PROT database of
proteins, the two genetically evolved consensus motifs
detect the two families either as well, or slightly
better than, the comparable human-written motifs found
in the PROSITE database.
Users
Please
log in to take part in the discussion (add own reviews or comments).