Pruning is a well-established technique for removing unnecessary structure
from neural networks after training to improve the performance of inference.
Several recent results have explored the possibility of pruning at
initialization time to provide similar benefits during training. In particular,
the "lottery ticket hypothesis" conjectures that typical neural networks
contain small subnetworks that can train to similar accuracy in a commensurate
number of steps. The evidence for this claim is that a procedure based on
iterative magnitude pruning (IMP) reliably finds such subnetworks retroactively
on small vision tasks. However, IMP fails on deeper networks, and proposed
methods to prune before training or train pruned networks encounter similar
scaling limitations.
In this paper, we argue that these efforts have struggled on deeper networks
because they have focused on pruning precisely at initialization. We modify IMP
to search for subnetworks that could have been obtained by pruning early in
training (0.1% to 7% through) rather than at iteration 0. With this change, it
finds small subnetworks of deeper networks (e.g., 80% sparsity on Resnet-50)
that can complete the training process to match the accuracy of the original
network on more challenging tasks (e.g., ImageNet). In situations where IMP
fails at iteration 0, the accuracy benefits of delaying pruning accrue rapidly
over the earliest iterations of training. To explain these behaviors, we study
subnetwork "stability," finding that - as accuracy improves in this fashion -
IMP subnetworks train to parameters closer to those of the full network and do
so with improved consistency in the face of gradient noise. These results offer
new insights into the opportunity to prune large-scale networks early in
training and the behaviors underlying the lottery ticket hypothesis.
Описание
[1903.01611] Stabilizing the Lottery Ticket Hypothesis
%0 Journal Article
%1 frankle2019stabilizing
%A Frankle, Jonathan
%A Dziugaite, Gintare Karolina
%A Roy, Daniel M.
%A Carbin, Michael
%D 2019
%K generalization sparsity stable
%T Stabilizing the Lottery Ticket Hypothesis
%U http://arxiv.org/abs/1903.01611
%X Pruning is a well-established technique for removing unnecessary structure
from neural networks after training to improve the performance of inference.
Several recent results have explored the possibility of pruning at
initialization time to provide similar benefits during training. In particular,
the "lottery ticket hypothesis" conjectures that typical neural networks
contain small subnetworks that can train to similar accuracy in a commensurate
number of steps. The evidence for this claim is that a procedure based on
iterative magnitude pruning (IMP) reliably finds such subnetworks retroactively
on small vision tasks. However, IMP fails on deeper networks, and proposed
methods to prune before training or train pruned networks encounter similar
scaling limitations.
In this paper, we argue that these efforts have struggled on deeper networks
because they have focused on pruning precisely at initialization. We modify IMP
to search for subnetworks that could have been obtained by pruning early in
training (0.1% to 7% through) rather than at iteration 0. With this change, it
finds small subnetworks of deeper networks (e.g., 80% sparsity on Resnet-50)
that can complete the training process to match the accuracy of the original
network on more challenging tasks (e.g., ImageNet). In situations where IMP
fails at iteration 0, the accuracy benefits of delaying pruning accrue rapidly
over the earliest iterations of training. To explain these behaviors, we study
subnetwork "stability," finding that - as accuracy improves in this fashion -
IMP subnetworks train to parameters closer to those of the full network and do
so with improved consistency in the face of gradient noise. These results offer
new insights into the opportunity to prune large-scale networks early in
training and the behaviors underlying the lottery ticket hypothesis.
@article{frankle2019stabilizing,
abstract = {Pruning is a well-established technique for removing unnecessary structure
from neural networks after training to improve the performance of inference.
Several recent results have explored the possibility of pruning at
initialization time to provide similar benefits during training. In particular,
the "lottery ticket hypothesis" conjectures that typical neural networks
contain small subnetworks that can train to similar accuracy in a commensurate
number of steps. The evidence for this claim is that a procedure based on
iterative magnitude pruning (IMP) reliably finds such subnetworks retroactively
on small vision tasks. However, IMP fails on deeper networks, and proposed
methods to prune before training or train pruned networks encounter similar
scaling limitations.
In this paper, we argue that these efforts have struggled on deeper networks
because they have focused on pruning precisely at initialization. We modify IMP
to search for subnetworks that could have been obtained by pruning early in
training (0.1% to 7% through) rather than at iteration 0. With this change, it
finds small subnetworks of deeper networks (e.g., 80% sparsity on Resnet-50)
that can complete the training process to match the accuracy of the original
network on more challenging tasks (e.g., ImageNet). In situations where IMP
fails at iteration 0, the accuracy benefits of delaying pruning accrue rapidly
over the earliest iterations of training. To explain these behaviors, we study
subnetwork "stability," finding that - as accuracy improves in this fashion -
IMP subnetworks train to parameters closer to those of the full network and do
so with improved consistency in the face of gradient noise. These results offer
new insights into the opportunity to prune large-scale networks early in
training and the behaviors underlying the lottery ticket hypothesis.},
added-at = {2019-06-13T16:57:40.000+0200},
author = {Frankle, Jonathan and Dziugaite, Gintare Karolina and Roy, Daniel M. and Carbin, Michael},
biburl = {https://www.bibsonomy.org/bibtex/29a0f6a70be85b321c7f8ac6dbb19a0dd/kirk86},
description = {[1903.01611] Stabilizing the Lottery Ticket Hypothesis},
interhash = {a8f8ca4798b56fece84dfac2c6210f7e},
intrahash = {9a0f6a70be85b321c7f8ac6dbb19a0dd},
keywords = {generalization sparsity stable},
note = {cite arxiv:1903.01611},
timestamp = {2019-06-13T16:57:40.000+0200},
title = {Stabilizing the Lottery Ticket Hypothesis},
url = {http://arxiv.org/abs/1903.01611},
year = 2019
}