Abstract
Non-linear functions such as neural networks can be locally approximated by
affine planes. Recent works make use of input-Jacobians, which describe the
normal to these planes. In this paper, we introduce full-Jacobians, which
includes this normal along with an additional intercept term called the
bias-Jacobians, that together completely describe local planes. For ReLU neural
networks, bias-Jacobians correspond to sums of gradients of outputs w.r.t.
intermediate layer activations.
We first use these full-Jacobians for distillation by aligning gradients of
their intermediate representations. Next, we regularize bias-Jacobians alone to
improve generalization. Finally, we show that full-Jacobian maps can be viewed
as saliency maps. Experimental results show improved distillation on small
data-sets, improved generalization for neural network training, and sharper
saliency maps.
Users
Please
log in to take part in the discussion (add own reviews or comments).