@tberg

Pachinko allocation: DAG-structured mixture models of topic correlations

, and . Proceedings of the 23rd international conference on Machine learning, (2006)

Abstract

Latent Dirichlet allocation (LDA) and other related topic models are increasingly popu- lar tools for summarization and manifold dis- covery in discrete data. However, LDA does not capture correlations between topics. In this paper, we introduce the pachinko alloca- tion model (PAM), which captures arbitrary, nested, and possibly sparse correlations be- tween topics using a directed acyclic graph (DAG). The leaves of the DAG represent in- dividual words in the vocabulary, while each interior node represents a correlation among its children, which may be words or other in- terior nodes (topics). PAM provides a flex- ible alternative to recent work by Blei and Lafferty (2006), which captures correlations only between pairs of topics. Using text data from newsgroups, historic NIPS proceedings and other research paper corpora, we show improved performance of PAM in document classification, likelihood of held-out data, the ability to support finer-grained topics, and topical keyword coherence.

Description

learns topic hierarchy, arbitrary DAG

Links and resources

Tags

community

  • @marie_brei
  • @ans
  • @gregoryy
  • @dblp
  • @tberg
@tberg's tags highlighted