An object can be seen as a geometrically organized set of interrelated parts.
A system that makes explicit use of these geometric relationships to recognize
objects should be naturally robust to changes in viewpoint, because the
intrinsic geometric relationships are viewpoint-invariant. We describe an
unsupervised version of capsule networks, in which a neural encoder, which
looks at all of the parts, is used to infer the presence and poses of object
capsules. The encoder is trained by backpropagating through a decoder, which
predicts the pose of each already discovered part using a mixture of pose
predictions. The parts are discovered directly from an image, in a similar
manner, by using a neural encoder, which infers parts and their affine
transformations. The corresponding decoder models each image pixel as a mixture
of predictions made by affine-transformed parts. We learn object- and their
part-capsules on unlabeled data, and then cluster the vectors of presences of
object capsules. When told the names of these clusters, we achieve
state-of-the-art results for unsupervised classification on SVHN (55%) and near
state-of-the-art on MNIST (98.5%).
%0 Journal Article
%1 kosiorek2019stacked
%A Kosiorek, Adam R.
%A Sabour, Sara
%A Teh, Yee Whye
%A Hinton, Geoffrey E.
%D 2019
%K deep-learning
%T Stacked Capsule Autoencoders
%U http://arxiv.org/abs/1906.06818
%X An object can be seen as a geometrically organized set of interrelated parts.
A system that makes explicit use of these geometric relationships to recognize
objects should be naturally robust to changes in viewpoint, because the
intrinsic geometric relationships are viewpoint-invariant. We describe an
unsupervised version of capsule networks, in which a neural encoder, which
looks at all of the parts, is used to infer the presence and poses of object
capsules. The encoder is trained by backpropagating through a decoder, which
predicts the pose of each already discovered part using a mixture of pose
predictions. The parts are discovered directly from an image, in a similar
manner, by using a neural encoder, which infers parts and their affine
transformations. The corresponding decoder models each image pixel as a mixture
of predictions made by affine-transformed parts. We learn object- and their
part-capsules on unlabeled data, and then cluster the vectors of presences of
object capsules. When told the names of these clusters, we achieve
state-of-the-art results for unsupervised classification on SVHN (55%) and near
state-of-the-art on MNIST (98.5%).
@article{kosiorek2019stacked,
abstract = {An object can be seen as a geometrically organized set of interrelated parts.
A system that makes explicit use of these geometric relationships to recognize
objects should be naturally robust to changes in viewpoint, because the
intrinsic geometric relationships are viewpoint-invariant. We describe an
unsupervised version of capsule networks, in which a neural encoder, which
looks at all of the parts, is used to infer the presence and poses of object
capsules. The encoder is trained by backpropagating through a decoder, which
predicts the pose of each already discovered part using a mixture of pose
predictions. The parts are discovered directly from an image, in a similar
manner, by using a neural encoder, which infers parts and their affine
transformations. The corresponding decoder models each image pixel as a mixture
of predictions made by affine-transformed parts. We learn object- and their
part-capsules on unlabeled data, and then cluster the vectors of presences of
object capsules. When told the names of these clusters, we achieve
state-of-the-art results for unsupervised classification on SVHN (55%) and near
state-of-the-art on MNIST (98.5%).},
added-at = {2019-06-18T21:10:26.000+0200},
author = {Kosiorek, Adam R. and Sabour, Sara and Teh, Yee Whye and Hinton, Geoffrey E.},
biburl = {https://www.bibsonomy.org/bibtex/290789c6aa5e91fc9a65bb5c6c030f9db/kirk86},
description = {[1906.06818] Stacked Capsule Autoencoders},
interhash = {4ab126143f0443b0202ed539b51a08e0},
intrahash = {90789c6aa5e91fc9a65bb5c6c030f9db},
keywords = {deep-learning},
note = {cite arxiv:1906.06818Comment: 13 pages, 6 figures, 4 tables},
timestamp = {2019-06-18T21:10:26.000+0200},
title = {Stacked Capsule Autoencoders},
url = {http://arxiv.org/abs/1906.06818},
year = 2019
}