Abstract
This paper summarises the use of a genetic programming
(GP) system to develop classification rules for gene
expression data that hold promise for the development
of new molecular diagnostics. This work focuses on
discovering simple, accurate rules that diagnose
diseases based on changes of gene expression profiles
within a diseased cell. GP is shown to be a useful
technique for discovering classification rules in a
supervised learning mode where the biological genotype
is paired with a biological phenotype such as a disease
state. In the process of developing these rules it is
necessary to develop new techniques for establishing
fitness and interpreting the results of evolutionary
runs because of the large number of independent
variables and the comparatively small number of
samples. These techniques are described and issues of
overfitting caused by small sample sizes and the
behaviour of the GP system when variables are missing
from the samples are discussed.
Users
Please
log in to take part in the discussion (add own reviews or comments).