Abstract
Understanding deep neural networks (DNNs) is a key challenge in the theory of
machine learning, with potential applications to the many fields where DNNs
have been successfully used. This article presents a scaling limit for a DNN
being trained by stochastic gradient descent. Our networks have a fixed (but
arbitrary) number $L2$ of inner layers; $N1$ neurons per layer; full
connections between layers; and fixed weights (or "random features" that are
not trained) near the input and output. Our results describe the evolution of
the DNN during training in the limit when $N+ınfty$, which we relate to a
mean field model of McKean-Vlasov type. Specifically, we show that network
weights are approximated by certain "ideal particles" whose distribution and
dependencies are described by the mean-field model. A key part of the proof is
to show existence and uniqueness for our McKean-Vlasov problem, which does not
seem to be amenable to existing theory. Our paper extends previous work on the
$L=1$ case by Mei, Montanari and Nguyen; Rotskoff and Vanden-Eijnden; and
Sirignano and Spiliopoulos. We also complement recent independent work on $L>1$
by Sirignano and Spiliopoulos (who consider a less natural scaling limit) and
Nguyen (who nonrigorously derives similar results).
Users
Please
log in to take part in the discussion (add own reviews or comments).