Sequence Discrimination Using Phase-type Distributions
J. Callut, and P. Dupont. Proceedings of the 17th European Conference on Machine Learning, page 78--89. Berlin, Heidelberg, Springer-Verlag, (2006)
DOI: 10.1007/11871842_12
Abstract
We propose in this paper a novel approach to the classification of discrete sequences. This approach builds a model fitting some dynamical features deduced from the learning sample. These features are discrete phase-type (PH) distributions. They model the first passage times (FPT) between occurrences of pairs of substrings. The PHit algorithm, an adapted version of the Expectation-Maximization algorithm, is proposed to estimate PH distributions. The most informative pairs of substrings are selected according to the Jensen-Shannon divergence between their class conditional empirical FPT distributions. The selected features are then used in two classification schemes: a maximum a posteriori (MAP) classifier and support vector machines (SVM) with marginalized kernels. Experiments on DNA splicing region detection and on protein sublocalization illustrate that the proposed techniques offer competitive results with smoothed Markov chains or SVM with a spectrum string kernel.
%0 Conference Paper
%1 Callut:2006:SDU:2091602.2091616
%A Callut, Jérôme
%A Dupont, Pierre
%B Proceedings of the 17th European Conference on Machine Learning
%C Berlin, Heidelberg
%D 2006
%E Fürnkranz, Johannes
%E Scheffer, Tobias
%E Spiliopoulou, Myra
%I Springer-Verlag
%K imported
%P 78--89
%R 10.1007/11871842_12
%T Sequence Discrimination Using Phase-type Distributions
%U http://dx.doi.org/10.1007/11871842_12
%X We propose in this paper a novel approach to the classification of discrete sequences. This approach builds a model fitting some dynamical features deduced from the learning sample. These features are discrete phase-type (PH) distributions. They model the first passage times (FPT) between occurrences of pairs of substrings. The PHit algorithm, an adapted version of the Expectation-Maximization algorithm, is proposed to estimate PH distributions. The most informative pairs of substrings are selected according to the Jensen-Shannon divergence between their class conditional empirical FPT distributions. The selected features are then used in two classification schemes: a maximum a posteriori (MAP) classifier and support vector machines (SVM) with marginalized kernels. Experiments on DNA splicing region detection and on protein sublocalization illustrate that the proposed techniques offer competitive results with smoothed Markov chains or SVM with a spectrum string kernel.
%@ 3-540-45375-X, 978-3-540-45375-8
@inproceedings{Callut:2006:SDU:2091602.2091616,
abstract = {We propose in this paper a novel approach to the classification of discrete sequences. This approach builds a model fitting some dynamical features deduced from the learning sample. These features are discrete phase-type (PH) distributions. They model the first passage times (FPT) between occurrences of pairs of substrings. The PHit algorithm, an adapted version of the Expectation-Maximization algorithm, is proposed to estimate PH distributions. The most informative pairs of substrings are selected according to the Jensen-Shannon divergence between their class conditional empirical FPT distributions. The selected features are then used in two classification schemes: a maximum a posteriori (MAP) classifier and support vector machines (SVM) with marginalized kernels. Experiments on DNA splicing region detection and on protein sublocalization illustrate that the proposed techniques offer competitive results with smoothed Markov chains or SVM with a spectrum string kernel.},
acmid = {2091616},
added-at = {2016-11-28T10:15:50.000+0100},
address = {Berlin, Heidelberg},
author = {Callut, Jérôme and Dupont, Pierre},
biburl = {https://www.bibsonomy.org/bibtex/2e656f8daac83cd79cc6644e082512f01/kde-alumni},
booktitle = {Proceedings of the 17th European Conference on Machine Learning},
doi = {10.1007/11871842_12},
editor = {Fürnkranz, Johannes and Scheffer, Tobias and Spiliopoulou, Myra},
interhash = {aec04b7740f3f65bf305e49a433c637b},
intrahash = {e656f8daac83cd79cc6644e082512f01},
isbn = {3-540-45375-X, 978-3-540-45375-8},
keywords = {imported},
location = {Berlin, Germany},
numpages = {12},
pages = {78--89},
publisher = {Springer-Verlag},
series = {ECML'06},
timestamp = {2016-11-28T10:15:50.000+0100},
title = {Sequence Discrimination Using Phase-type Distributions},
url = {http://dx.doi.org/10.1007/11871842_12},
year = 2006
}