While generative models such as Latent Dirichlet Allocation (LDA) have proven
fruitful in topic modeling, they often require detailed assumptions and careful
specification of hyperparameters. Such model complexity issues only compound
when trying to generalize generative models to incorporate human input. We
introduce Correlation Explanation (CorEx), an alternative approach to topic
modeling that does not assume an underlying generative model, and instead
learns maximally informative topics through an information-theoretic framework.
This framework naturally generalizes to hierarchical and semi-supervised
extensions with no additional modeling assumptions. In particular, word-level
domain knowledge can be flexibly incorporated within CorEx through anchor
words, allowing topic separability and representation to be promoted with
minimal human intervention. Across a variety of datasets, metrics, and
experiments, we demonstrate that CorEx produces topics that are comparable in
quality to those produced by unsupervised and semi-supervised variants of LDA.
[1611.10277] Anchored Correlation Explanation: Topic Modeling with Minimal Domain Knowledge