Abstract
Many predictive tasks, such as diagnosing a patient based on their medical
chart, are ultimately defined by the decisions of human experts. Unfortunately,
encoding experts' knowledge is often time consuming and expensive. We propose a
simple way to use fuzzy and informal knowledge from experts to guide discovery
of interpretable latent topics in text. The underlying intuition of our
approach is that latent factors should be informative about both correlations
in the data and a set of relevance variables specified by an expert.
Mathematically, this approach is a combination of the information bottleneck
and Total Correlation Explanation (CorEx). We give a preliminary evaluation of
Anchored CorEx, showing that it produces more coherent and interpretable topics
on two distinct corpora.
Users
Please
log in to take part in the discussion (add own reviews or comments).