Abstract
Despite important advances in microarray-based
molecular classification of tumours, its application in
clinical settings remains formidable. This is in part
due to the limitation of current analysis programs in
discovering robust biomarkers and developing
classifiers with a practical set of genes. Genetic
programming (GP) is a type of machine learning
technique that uses evolutionary algorithm to simulate
natural selection as well as population dynamics, hence
leading to simple and comprehensible classifiers. Here
we applied GP to cancer expression profiling data to
select feature genes and build molecular classifiers by
mathematical integration of these genes. Analysis of
thousands of GP classifiers generated for a prostate
cancer data set revealed repetitive use of a set of
highly discriminative feature genes, many of which are
known to be disease associated. GP classifiers often
comprise five or less genes and successfully predict
cancer types and subtypes. More importantly, GP
classifiers generated in one study are able to predict
samples from an independent study, which may have used
different microarray platforms. In addition, GP yielded
classification accuracy better than or similar to
conventional classification methods. Furthermore, the
mathematical expression of GP classifiers provides
insights into relationships between classifier genes.
Taken together, our results demonstrate that GP may be
valuable for generating effective classifiers
containing a practical set of genes for
diagnostic/prognostic cancer classification.
Users
Please
log in to take part in the discussion (add own reviews or comments).