Abstract
Large CNNs have delivered impressive performance in various computer vision
applications. But the storage and computation requirements make it problematic
for deploying these models on mobile devices. Recently, tensor decompositions
have been used for speeding up CNNs. In this paper, we further develop the
tensor decomposition technique. We propose a new algorithm for computing the
low-rank tensor decomposition for removing the redundancy in the convolution
kernels. The algorithm finds the exact global optimizer of the decomposition
and is more effective than iterative methods. Based on the decomposition, we
further propose a new method for training low-rank constrained CNNs from
scratch. Interestingly, while achieving a significant speedup, sometimes the
low-rank constrained CNNs delivers significantly better performance than their
non-constrained counterparts. On the CIFAR-10 dataset, the proposed low-rank
NIN model achieves $91.31\%$ accuracy (without data augmentation), which also
improves upon state-of-the-art result. We evaluated the proposed method on
CIFAR-10 and ILSVRC12 datasets for a variety of modern CNNs, including AlexNet,
NIN, VGG and GoogleNet with success. For example, the forward time of VGG-16 is
reduced by half while the performance is still comparable. Empirical success
suggests that low-rank tensor decompositions can be a very useful tool for
speeding up large CNNs.
Users
Please
log in to take part in the discussion (add own reviews or comments).