Abstract
We investigate neural network training and generalization using the concept
of stiffness. We measure how stiff a network is by looking at how a small
gradient step on one example affects the loss on another example. In
particular, we study how stiffness depends on 1) class membership, 2) distance
between data points in the input space, 3) training iteration, and 4) learning
rate. We experiment on MNIST, FASHION MNIST, and CIFAR-10 using fully-connected
and convolutional neural networks. Our results demonstrate that stiffness is a
useful concept for diagnosing and characterizing generalization. We observe
that small learning rates reliably lead to higher stiffness at a given epoch as
well as at a given training loss. In addition, we measure how stiffness between
two data points depends on their mutual input space distance, and establish the
concept of a dynamical critical length that characterizes the distance over
which data points react similarly to gradient updates. The dynamical critical
length decreases with training and the higher the learning rate, the smaller
the critical length.
Users
Please
log in to take part in the discussion (add own reviews or comments).