Incollection,

Ensemble of decision trees for neuronal source localization of the brain

.
Abstract Book of the XXIII IUPAP International Conference on Statistical Physics, Genova, Italy, (9-13 July 2007)

Abstract

The task of extracting knowledge of EEG data is needed for many applications in bioinformatics, medicine, brain-computer interface and security system development, expert system creation and others. Electroencephalography is the neurophysiologic measurement of the electrical activity of the brain by recording from electrodes placed on the scalp. EEGs are frequently used in experimentation because the process is non-invasive to the research subject. In this paper describes the approach for extracting knowledge from EEG data for neural source localization of electrical brain activity based on ensemble of decision trees. The main idea of a new method is to consider the sources parameters as the decision trees attributes in parallel instead of direct handling with row space-time measurements. If the sources are spatially localized such approach give us additional advantages in compression of data. Suggested algorithm is based on construction and classification of the training database for ensemble of trees depending on the value of residual relative energy (RRE) error. The RRE error estimates the difference between model potential distribution on the scalp and measured EEG potential. The introduced threshold of RRE splits each set of attribute values (dipole parameters) into two classes: good dipoles positions (low error, first class) and bad positions (high error, second class). Parameters of dipole sources – 3N coordinates, 3N current densities (where N is the number of dipoles are the attributes of decision trees. When the number of dipoles is increasing to learn the data from the whole database is inefficient. We suggest using two ways of attribute selection for each decision tree construction. First is base on random subspace method when the data training set consists of randomly chosen set of attributes which number << number of initial database. Then we get the random decision tree for each time moment which is fully grown and without pruning. Ensemble defines the disjoint regions of probably dipole position. Combining decision trees for all time windows and using weighted voting based on RRE estimation the final dipole position is found. Another way to use randomization in attribute selection is for each node of the tree, randomly choose m variables on which to base the decision at that node. Calculate the best split based on these m variables in the training set. Second way is to fix number of attributes for each time moment. For example, for the first 1/3 of time points we define to use only 3 attributes which are 3N parameters of dipoles positions. Then on other 2/3 points we iteratively add the attributes of dipole density and apply the same voting procedure as in the first case. This method works effectively because of the task ambiguity. Nevertheless additional Tikchonov regularization is used for large dataset. The specific character of biomedical data is its mixed and noisy data type. In particular, the changes of emotional person state could modify the signal noticeably and therefore for processing the signal robust method is needed. The decision trees give piece-wise approximation of input data parameters. This leads to robust data approximation for missing and noisy data. Construction full decision trees for many data sets produces a highly accurate classifier. Ensemble is a committee of trees each of it is constructed independently. At this paper we describe the method of parallel ensemble learning and voting. The proposed methods of parallelization were tested on noisy model databases, on the real filtered data and rough EEG signals.

Tags

Users

  • @statphys23

Comments and Reviews