Abstract
In many settings, we have multiple data sets (also called views) that capture
different and overlapping aspects of the same phenomenon. We are often
interested in finding patterns that are unique to one or to a subset of the
views. For example, we might have one set of molecular observations and one set
of physiological observations on the same group of individuals, and we want to
quantify molecular patterns that are uncorrelated with physiology. Despite
being a common problem, this is highly challenging when the correlations come
from complex distributions. In this paper, we develop the general framework of
Rich Component Analysis (RCA) to model settings where the observations from
different views are driven by different sets of latent components, and each
component can be a complex, high-dimensional distribution. We introduce
algorithms based on cumulant extraction that provably learn each of the
components without having to model the other components. We show how to
integrate RCA with stochastic gradient descent into a meta-algorithm for
learning general models, and demonstrate substantial improvement in accuracy on
several synthetic and real datasets in both supervised and unsupervised tasks.
Our method makes it possible to learn latent variable models when we don't have
samples from the true model but only samples after complex perturbations.
Users
Please
log in to take part in the discussion (add own reviews or comments).