Inproceedings,

Left-to-Right HDP-HMM with HDP Emission

A. Harati, J. Picone, and M. Sobel.
Proceedings of the Neural Information Processing Systems Conference (NIPS), page 1–7. Lake Tahoe, Nevada, USA, (March 2013)
DOI: 10.1109/CISS.2014.6814172

Abstract

Nonparametric Bayesian models use a Bayesian framework to learn the model complexity automatically from the data and eliminate the need for a complex model selection process. The Hierarchical Dirichlet Process hidden Markov model (HDP-HMM) is the nonparametric Bayesian equivalent of an HMM. However, HDP-HMM is restricted to an ergodic topology and uses a Dirichlet Process Model (DPM) to achieve a mixture distribution-like model. For applications such as speech recognition, where we deal with ordered sequences, it is desirable to impose a left-to-right structure on the model to improve its ability to model the sequential nature of the speech signal. In this paper, we introduce three enhancements to HDP-HMM: (1) a left-to-right structure: needed for sequential decoding of speech, (2) non-emitting initial and final states: required for modeling finite length sequences, (3) HDP mixture emissions: allows sharing of data across states. The latter is particularly important for speech recognition because Gaussian mixture models have been very effective at modeling speaker variability. Further, due to the nature of language, some models occur infrequently and have a small number of data points associated with them, even for large corpora. Sharing allows these models to be estimated more accurately. We demonstrate that this new HDP-HMM model produces a 15% increase in likelihoods and a 15% relative reduction in error rate on a phoneme classification task based on the TIMIT Corpus.

BibTeX key: HaratiPiconeSobel2013
entry type: inproceedings
address: Lake Tahoe, Nevada, USA
booktitle: Proceedings of the Neural Information Processing Systems Conference (NIPS)
year: 2013
month: March
pages: 1–7
DOI: 10.1109/CISS.2014.6814172

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

%0 Conference Paper %1 HaratiPiconeSobel2013 %A Harati, A. %A Picone, J. %A Sobel, M. %B Proceedings of the Neural Information Processing Systems Conference (NIPS) %C Lake Tahoe, Nevada, USA %D 2013 %K Bayes Bayesian Corpus;data Dirichlet Markov Model;Gaussian Process classification decoding;speaker distribution-like emissions;HDP-HMM;TIMIT framework;DPM;Dirichlet hidden initial length methods;Computational methods;Gaussian mixture model;left-to-right model;nonemitting modeling;Data models;HDP models;Hidden models;Speech;Speech models;hierarchical models;mixture models;phoneme models;speech nature;sequential points;ergodic processes;decoding;hidden processes;non-parametric recognition recognition;Bayesian recognition;Topology;hidden recognition;speech sequences;hierarchical signal;Bayes speech states;finite states;nonparametric structure;mixture task;sequential topology;final variability;speech %P 1–7 %R 10.1109/CISS.2014.6814172 %T Left-to-Right HDP-HMM with HDP Emission %X Nonparametric Bayesian models use a Bayesian framework to learn the model complexity automatically from the data and eliminate the need for a complex model selection process. The Hierarchical Dirichlet Process hidden Markov model (HDP-HMM) is the nonparametric Bayesian equivalent of an HMM. However, HDP-HMM is restricted to an ergodic topology and uses a Dirichlet Process Model (DPM) to achieve a mixture distribution-like model. For applications such as speech recognition, where we deal with ordered sequences, it is desirable to impose a left-to-right structure on the model to improve its ability to model the sequential nature of the speech signal. In this paper, we introduce three enhancements to HDP-HMM: (1) a left-to-right structure: needed for sequential decoding of speech, (2) non-emitting initial and final states: required for modeling finite length sequences, (3) HDP mixture emissions: allows sharing of data across states. The latter is particularly important for speech recognition because Gaussian mixture models have been very effective at modeling speaker variability. Further, due to the nature of language, some models occur infrequently and have a small number of data points associated with them, even for large corpora. Sharing allows these models to be estimated more accurately. We demonstrate that this new HDP-HMM model produces a 15% increase in likelihoods and a 15% relative reduction in error rate on a phoneme classification task based on the TIMIT Corpus.

@inproceedings{HaratiPiconeSobel2013, abstract = {Nonparametric Bayesian models use a Bayesian framework to learn the model complexity automatically from the data and eliminate the need for a complex model selection process. The Hierarchical Dirichlet Process hidden Markov model (HDP-HMM) is the nonparametric Bayesian equivalent of an HMM. However, HDP-HMM is restricted to an ergodic topology and uses a Dirichlet Process Model (DPM) to achieve a mixture distribution-like model. For applications such as speech recognition, where we deal with ordered sequences, it is desirable to impose a left-to-right structure on the model to improve its ability to model the sequential nature of the speech signal. In this paper, we introduce three enhancements to HDP-HMM: (1) a left-to-right structure: needed for sequential decoding of speech, (2) non-emitting initial and final states: required for modeling finite length sequences, (3) HDP mixture emissions: allows sharing of data across states. The latter is particularly important for speech recognition because Gaussian mixture models have been very effective at modeling speaker variability. Further, due to the nature of language, some models occur infrequently and have a small number of data points associated with them, even for large corpora. Sharing allows these models to be estimated more accurately. We demonstrate that this new HDP-HMM model produces a 15% increase in likelihoods and a 15% relative reduction in error rate on a phoneme classification task based on the TIMIT Corpus.}, added-at = {2016-05-13T17:49:14.000+0200}, address = {Lake Tahoe, Nevada, USA}, author = {Harati, A. and Picone, J. and Sobel, M.}, biburl = {https://www.bibsonomy.org/bibtex/22f448021b97ed3124fad5f8509944eb1/templehpc}, booktitle = {Proceedings of the Neural Information Processing Systems Conference (NIPS)}, doi = {10.1109/CISS.2014.6814172}, interhash = {b72b2f56cd54309ab6b4fb1bf2a5c74d}, intrahash = {2f448021b97ed3124fad5f8509944eb1}, keywords = {Bayes Bayesian Corpus;data Dirichlet Markov Model;Gaussian Process classification decoding;speaker distribution-like emissions;HDP-HMM;TIMIT framework;DPM;Dirichlet hidden initial length methods;Computational methods;Gaussian mixture model;left-to-right model;nonemitting modeling;Data models;HDP models;Hidden models;Speech;Speech models;hierarchical models;mixture models;phoneme models;speech nature;sequential points;ergodic processes;decoding;hidden processes;non-parametric recognition recognition;Bayesian recognition;Topology;hidden recognition;speech sequences;hierarchical signal;Bayes speech states;finite states;nonparametric structure;mixture task;sequential topology;final variability;speech}, month = {March}, pages = {1–7}, timestamp = {2016-05-13T17:49:46.000+0200}, title = {Left-to-Right HDP-HMM with HDP Emission}, year = 2013 }

BibSonomy

Left-to-Right HDP-HMM with HDP Emission

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on