We present a formulation of deep learning that aims at producing a large
margin classifier. The notion of margin, minimum distance to a decision
boundary, has served as the foundation of several theoretically profound and
empirically successful results for both classification and regression tasks.
However, most large margin algorithms are applicable only to shallow models
with a preset feature representation; and conventional margin methods for
neural networks only enforce margin at the output layer. Such methods are
therefore not well suited for deep networks.
In this work, we propose a novel loss function to impose a margin on any
chosen set of layers of a deep network (including input and hidden layers). Our
formulation allows choosing any norm on the metric measuring the margin. We
demonstrate that the decision boundary obtained by our loss has nice properties
compared to standard classification loss functions. Specifically, we show
improved empirical results on the MNIST, CIFAR-10 and ImageNet datasets on
multiple tasks: generalization from small training sets, corrupted labels, and
robustness against adversarial perturbations. The resulting loss is general and
complementary to existing data augmentation (such as random/adversarial input
transform) and regularization techniques (such as weight decay, dropout, and
batch norm).
%0 Generic
%1 citeulike:14568173
%A xxx,
%D 2018
%K classification loss
%T Large Margin Deep Networks for Classification
%U http://arxiv.org/abs/1803.05598
%X We present a formulation of deep learning that aims at producing a large
margin classifier. The notion of margin, minimum distance to a decision
boundary, has served as the foundation of several theoretically profound and
empirically successful results for both classification and regression tasks.
However, most large margin algorithms are applicable only to shallow models
with a preset feature representation; and conventional margin methods for
neural networks only enforce margin at the output layer. Such methods are
therefore not well suited for deep networks.
In this work, we propose a novel loss function to impose a margin on any
chosen set of layers of a deep network (including input and hidden layers). Our
formulation allows choosing any norm on the metric measuring the margin. We
demonstrate that the decision boundary obtained by our loss has nice properties
compared to standard classification loss functions. Specifically, we show
improved empirical results on the MNIST, CIFAR-10 and ImageNet datasets on
multiple tasks: generalization from small training sets, corrupted labels, and
robustness against adversarial perturbations. The resulting loss is general and
complementary to existing data augmentation (such as random/adversarial input
transform) and regularization techniques (such as weight decay, dropout, and
batch norm).
@misc{citeulike:14568173,
abstract = {{We present a formulation of deep learning that aims at producing a large
margin classifier. The notion of margin, minimum distance to a decision
boundary, has served as the foundation of several theoretically profound and
empirically successful results for both classification and regression tasks.
However, most large margin algorithms are applicable only to shallow models
with a preset feature representation; and conventional margin methods for
neural networks only enforce margin at the output layer. Such methods are
therefore not well suited for deep networks.
In this work, we propose a novel loss function to impose a margin on any
chosen set of layers of a deep network (including input and hidden layers). Our
formulation allows choosing any norm on the metric measuring the margin. We
demonstrate that the decision boundary obtained by our loss has nice properties
compared to standard classification loss functions. Specifically, we show
improved empirical results on the MNIST, CIFAR-10 and ImageNet datasets on
multiple tasks: generalization from small training sets, corrupted labels, and
robustness against adversarial perturbations. The resulting loss is general and
complementary to existing data augmentation (such as random/adversarial input
transform) and regularization techniques (such as weight decay, dropout, and
batch norm).}},
added-at = {2019-02-27T22:23:29.000+0100},
archiveprefix = {arXiv},
author = {xxx},
biburl = {https://www.bibsonomy.org/bibtex/22a4826db86d229ea0cdc3c43f30a9268/nmatsuk},
citeulike-article-id = {14568173},
citeulike-linkout-0 = {http://arxiv.org/abs/1803.05598},
citeulike-linkout-1 = {http://arxiv.org/pdf/1803.05598},
day = 15,
eprint = {1803.05598},
interhash = {935f2d2e9ba0db84e9848afb075e2587},
intrahash = {2a4826db86d229ea0cdc3c43f30a9268},
keywords = {classification loss},
month = mar,
posted-at = {2018-04-13 09:06:35},
priority = {0},
timestamp = {2019-02-27T22:23:29.000+0100},
title = {{Large Margin Deep Networks for Classification}},
url = {http://arxiv.org/abs/1803.05598},
year = 2018
}