Zusammenfassung
Class prediction and feature selection are two learning tasks that are strictly paired in the search of molecular profiles from
microarray data. Researchers have become aware how easy it is to incur a selection bias effect, and complex validation setups are
required to avoid overly optimistic estimates of the predictive accuracy of the models and incorrect gene selections. This paper
describes a semisupervised pattern discovery approach that uses the by-products of complete validation studies on experimental
setups for gene profiling. In particular, we introduce the study of the patterns of single sample responses (sample-tracking profiles) to
the gene selection process induced by typical supervised learning tasks in microarray studies. We originate sample-tracking profiles as
the aggregated off-training evaluation of SVM models of increasing gene panel sizes. Genes are ranked by E-RFE, an entropy-based
variant of the recursive feature elimination for support vector machines (RFE-SVM). A Dynamic Time Warping (DTW) algorithm is then
applied to define a metric between sample-tracking profiles. An unsupervised clustering based on the DTW metric allows automating
the discovery of outliers and of subtypes of different molecular profiles. Applications are described on synthetic data and in two gene
expression studies.
Nutzer