Abstract
Shallow supervised 1-hidden layer neural networks have a number of favorable
properties that make them easier to interpret, analyze, and optimize than their
deep counterparts, but lack their representational power. Here we use 1-hidden
layer learning problems to sequentially build deep networks layer by layer,
which can inherit properties from shallow networks. Contrary to previous
approaches using shallow networks, we focus on problems where deep learning is
reported as critical for success. We thus study CNNs on image classification
tasks using the large-scale ImageNet dataset and the CIFAR-10 dataset. Using a
simple set of ideas for architecture and training we find that solving
sequential 1-hidden-layer auxiliary problems lead to a CNN that exceeds AlexNet
performance on ImageNet. Extending this training methodology to construct
individual layers by solving 2-and-3-hidden layer auxiliary problems, we obtain
an 11-layer network that exceeds several members of the VGG model family on
ImageNet, and can train a VGG-11 model to the same accuracy as end-to-end
learning. To our knowledge, this is the first competitive alternative to
end-to-end training of CNNs that can scale to ImageNet. We illustrate several
interesting properties of these models theoretically and conduct a range of
experiments to study the properties this training induces on the intermediate
layers.
Users
Please
log in to take part in the discussion (add own reviews or comments).