Discovering Structure in High-Dimensional Data Through Correlation Explanation.
G. Steeg, and A. Galstyan. (2014)cite arxiv:1406.1222Comment: 15 pages, 6 figures. Includes supplementary material and link to code. Published in the proceedings of the 28th Annual Conference on Neural Information Processing Systems, NIPS 2014.
We introduce a method to learn a hierarchy of successively more abstract
representations of complex data based on optimizing an information-theoretic
objective. Intuitively, the optimization searches for a set of latent factors
that best explain the correlations in the data as measured by multivariate
mutual information. The method is unsupervised, requires no model assumptions,
and scales linearly with the number of variables which makes it an attractive
approach for very high dimensional systems. We demonstrate that Correlation
Explanation (CorEx) automatically discovers meaningful structure for data from
diverse sources including personality tests, DNA, and human language.