D. Blei, and J. Lafferty. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS, (2006)
Abstract
Topic models, such as latent Dirichlet allocation (LDA), can be useful
tools for the statistical analysis of document collections and other dis-
crete data. The LDA model assumes that the words of each document
arise from a mixture of topics, each of which is a distribution over the vo-
cabulary. A limitation of LDA is the inability to model topic correlation
even though, for example, a document about genetics is more likely to
also be about disease than x-ray astronomy. This limitation stems from
the use of the Dirichlet distribution to model the variability among the
topic proportions. In this paper we develop the correlated topic model
(CTM), where the topic proportions exhibit correlation via the logistic
normal distribution 1. We derive a mean-field variational inference al-
gorithm for approximate posterior inference in this model, which is com-
plicated by the fact that the logistic normal is not conjugate to the multi-
nomial. The CTM gives a better fit than LDA on a collection of OCRed
articles from the journal Science. Furthermore, the CTM provides a nat-
ural way of visualizing and exploring this and other unstructured data
sets.
%0 Journal Article
%1 blei2006ctm
%A Blei, D.
%A Lafferty, J.
%D 2006
%I MIT; 1998
%J ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS
%K lda master toVerify topics
%P 147
%T Correlated Topic Models
%U http://www.cs.cmu.edu/~lafferty/pub/ctm.pdf
%V 18
%X Topic models, such as latent Dirichlet allocation (LDA), can be useful
tools for the statistical analysis of document collections and other dis-
crete data. The LDA model assumes that the words of each document
arise from a mixture of topics, each of which is a distribution over the vo-
cabulary. A limitation of LDA is the inability to model topic correlation
even though, for example, a document about genetics is more likely to
also be about disease than x-ray astronomy. This limitation stems from
the use of the Dirichlet distribution to model the variability among the
topic proportions. In this paper we develop the correlated topic model
(CTM), where the topic proportions exhibit correlation via the logistic
normal distribution 1. We derive a mean-field variational inference al-
gorithm for approximate posterior inference in this model, which is com-
plicated by the fact that the logistic normal is not conjugate to the multi-
nomial. The CTM gives a better fit than LDA on a collection of OCRed
articles from the journal Science. Furthermore, the CTM provides a nat-
ural way of visualizing and exploring this and other unstructured data
sets.
@article{blei2006ctm,
abstract = {Topic models, such as latent Dirichlet allocation (LDA), can be useful
tools for the statistical analysis of document collections and other dis-
crete data. The LDA model assumes that the words of each document
arise from a mixture of topics, each of which is a distribution over the vo-
cabulary. A limitation of LDA is the inability to model topic correlation
even though, for example, a document about genetics is more likely to
also be about disease than x-ray astronomy. This limitation stems from
the use of the Dirichlet distribution to model the variability among the
topic proportions. In this paper we develop the correlated topic model
(CTM), where the topic proportions exhibit correlation via the logistic
normal distribution [1]. We derive a mean-field variational inference al-
gorithm for approximate posterior inference in this model, which is com-
plicated by the fact that the logistic normal is not conjugate to the multi-
nomial. The CTM gives a better fit than LDA on a collection of OCRed
articles from the journal Science. Furthermore, the CTM provides a nat-
ural way of visualizing and exploring this and other unstructured data
sets.},
added-at = {2010-05-11T15:38:46.000+0200},
author = {Blei, D. and Lafferty, J.},
biburl = {https://www.bibsonomy.org/bibtex/284a7198bb6c60c109a2722ea2d73593c/ans},
interhash = {823a0ada41df4deb528ca74afcdcd36f},
intrahash = {84a7198bb6c60c109a2722ea2d73593c},
journal = {ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS},
keywords = {lda master toVerify topics},
pages = 147,
publisher = {MIT; 1998},
timestamp = {2011-03-22T23:02:17.000+0100},
title = {{Correlated Topic Models}},
url = {http://www.cs.cmu.edu/~lafferty/pub/ctm.pdf},
volume = 18,
year = 2006
}