Misc,

SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems

B. Chen, T. Medini, J. Farwell, S. Gobriel, C. Tai, and A. Shrivastava.
(2019)cite arxiv:1903.03129Comment: Published at MLSys 2020.

Abstract

Deep Learning (DL) algorithms are the central focus of modern machine learning systems. As data volumes keep growing, it has become customary to train large neural networks with hundreds of millions of parameters to maintain enough capacity to memorize these volumes and obtain state-of-the-art accuracy. To get around the costly computations associated with large models and data, the community is increasingly investing in specialized hardware for model training. However, specialized hardware is expensive and hard to generalize to a multitude of tasks. The progress on the algorithmic front has failed to demonstrate a direct advantage over powerful hardware such as NVIDIA-V100 GPUs. This paper provides an exception. We propose SLIDE (Sub-LInear Deep learning Engine) that uniquely blends smart randomized algorithms, with multi-core parallelism and workload optimization. Using just a CPU, SLIDE drastically reduces the computations during both training and inference outperforming an optimized implementation of Tensorflow (TF) on the best available GPU. Our evaluations on industry-scale recommendation datasets, with large fully connected architectures, show that training with SLIDE on a 44 core CPU is more than 3.5 times (1 hour vs. 3.5 hours) faster than the same network trained using TF on Tesla V100 at any given accuracy level. On the same CPU hardware, SLIDE is over 10x faster than TF. We provide codes and scripts for reproducibility.

BibTeX key: chen2019slide
entry type: misc
year: 2019
url: http://arxiv.org/abs/1903.03129
note: cite arxiv:1903.03129Comment: Published at MLSys 2020

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

@misc{chen2019slide, abstract = {Deep Learning (DL) algorithms are the central focus of modern machine learning systems. As data volumes keep growing, it has become customary to train large neural networks with hundreds of millions of parameters to maintain enough capacity to memorize these volumes and obtain state-of-the-art accuracy. To get around the costly computations associated with large models and data, the community is increasingly investing in specialized hardware for model training. However, specialized hardware is expensive and hard to generalize to a multitude of tasks. The progress on the algorithmic front has failed to demonstrate a direct advantage over powerful hardware such as NVIDIA-V100 GPUs. This paper provides an exception. We propose SLIDE (Sub-LInear Deep learning Engine) that uniquely blends smart randomized algorithms, with multi-core parallelism and workload optimization. Using just a CPU, SLIDE drastically reduces the computations during both training and inference outperforming an optimized implementation of Tensorflow (TF) on the best available GPU. Our evaluations on industry-scale recommendation datasets, with large fully connected architectures, show that training with SLIDE on a 44 core CPU is more than 3.5 times (1 hour vs. 3.5 hours) faster than the same network trained using TF on Tesla V100 at any given accuracy level. On the same CPU hardware, SLIDE is over 10x faster than TF. We provide codes and scripts for reproducibility.}, added-at = {2020-03-04T15:55:54.000+0100}, author = {Chen, Beidi and Medini, Tharun and Farwell, James and Gobriel, Sameh and Tai, Charlie and Shrivastava, Anshumali}, biburl = {https://www.bibsonomy.org/bibtex/29b87f50b1d8c258e991af48638ac5b9a/uatcp}, description = {SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems}, interhash = {a1c8ccaf679e92a5b17bee436e72306c}, intrahash = {9b87f50b1d8c258e991af48638ac5b9a}, keywords = {GPU ML SLIDE}, note = {cite arxiv:1903.03129Comment: Published at MLSys 2020}, timestamp = {2020-03-18T11:03:16.000+0100}, title = {SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems}, url = {http://arxiv.org/abs/1903.03129}, year = 2019 }

BibSonomy

SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on