Abstract
Latent Dirichlet allocation (LDA) and other
related topic models are increasingly popu-
lar tools for summarization and manifold dis-
covery in discrete data. However, LDA does
not capture correlations between topics. In
this paper, we introduce the pachinko alloca-
tion model (PAM), which captures arbitrary,
nested, and possibly sparse correlations be-
tween topics using a directed acyclic graph
(DAG). The leaves of the DAG represent in-
dividual words in the vocabulary, while each
interior node represents a correlation among
its children, which may be words or other in-
terior nodes (topics). PAM provides a flex-
ible alternative to recent work by Blei and
Lafferty (2006), which captures correlations
only between pairs of topics. Using text data
from newsgroups, historic NIPS proceedings
and other research paper corpora, we show
improved performance of PAM in document
classification, likelihood of held-out data, the
ability to support finer-grained topics, and
topical keyword coherence.
Description
learns topic hierarchy, arbitrary DAG
Links and resources
Tags
community