Abstract
Recently, fully-connected and convolutional neural networks have been trained
to achieve state-of-the-art performance on a wide variety of tasks such as
speech recognition, image classification, natural language processing, and
bioinformatics. For classification tasks, most of these "deep learning" models
employ the softmax activation function for prediction and minimize
cross-entropy loss. In this paper, we demonstrate a small but consistent
advantage of replacing the softmax layer with a linear support vector machine.
Learning minimizes a margin-based loss instead of the cross-entropy loss. While
there have been various combinations of neural nets and SVMs in prior art, our
results using L2-SVMs show that by simply replacing softmax with linear SVMs
gives significant gains on popular deep learning datasets MNIST, CIFAR-10, and
the ICML 2013 Representation Learning Workshop's face expression recognition
challenge.
Users
Please
log in to take part in the discussion (add own reviews or comments).