Intelligent agents should be able to learn useful representations by
observing changes in their environment. We model such observations as pairs of
non-i.i.d. images sharing at least one of the underlying factors of variation.
First, we theoretically show that only knowing how many factors have changed,
but not which ones, is sufficient to learn disentangled representations.
Second, we provide practical algorithms that learn disentangled representations
from pairs of images without requiring annotation of groups, individual
factors, or the number of factors that have changed. Third, we perform a
large-scale empirical study and show that such pairs of observations are
sufficient to reliably learn disentangled representations on several benchmark
data sets. Finally, we evaluate our learned representations and find that they
are simultaneously useful on a diverse suite of tasks, including generalization
under covariate shifts, fairness, and abstract reasoning. Overall, our results
demonstrate that weak supervision enables learning of useful disentangled
representations in realistic scenarios.