Article,

Stiffness: A New Perspective on Generalization in Neural Networks

, , , and .
(2019)cite arxiv:1901.09491.

Abstract

We investigate neural network training and generalization using the concept of stiffness. We measure how stiff a network is by looking at how a small gradient step on one example affects the loss on another example. In particular, we study how stiffness depends on 1) class membership, 2) distance between data points in the input space, 3) training iteration, and 4) learning rate. We experiment on MNIST, FASHION MNIST, and CIFAR-10 using fully-connected and convolutional neural networks. Our results demonstrate that stiffness is a useful concept for diagnosing and characterizing generalization. We observe that small learning rates reliably lead to higher stiffness at a given epoch as well as at a given training loss. In addition, we measure how stiffness between two data points depends on their mutual input space distance, and establish the concept of a dynamical critical length that characterizes the distance over which data points react similarly to gradient updates. The dynamical critical length decreases with training and the higher the learning rate, the smaller the critical length.

Tags

Users

  • @kirk86

Comments and Reviews