Abstract
The key idea behind the unsupervised learning of disentangled representations
is that real-world data is generated by a few explanatory factors of variation
which can be recovered by unsupervised learning algorithms. In this paper, we
provide a sober look on recent progress in the field and challenge some common
assumptions. We first theoretically show that the unsupervised learning of
disentangled representations is fundamentally impossible without inductive
biases on both the models and the data. Then, we train more than 12000 models
covering most prominent methods and evaluation metrics in a reproducible
large-scale experimental study on seven different data sets. We observe that
while the different methods successfully enforce properties `encouraged' by the
corresponding losses, well-disentangled models seemingly cannot be identified
without supervision. Furthermore, increased disentanglement does not seem to
lead to a decreased sample complexity of learning for downstream tasks. Our
results suggest that future work on disentanglement learning should be explicit
about the role of inductive biases and (implicit) supervision, investigate
concrete benefits of enforcing disentanglement of the learned representations,
and consider a reproducible experimental setup covering several data sets.
Users
Please
log in to take part in the discussion (add own reviews or comments).