Misc,

Once for All: Train One Network and Specialize it for Efficient Deployment

H. Cai, C. Gan, and S. Han.
(2019)cite arxiv:1908.09791.

Abstract

Efficient deployment of deep learning models requires specialized neural network architectures to best fit different hardware platforms and efficiency constraints (defined as deployment scenarios). Traditional approaches either manually design or use AutoML to search a specialized neural network and train it from scratch for each case. It is expensive and unscalable since their training cost is linear w.r.t. the number of deployment scenarios. In this work, we introduce Once for All (OFA) for efficient neural network design to handle many deployment scenarios, a new methodology that decouples model training from architecture search. Instead of training a specialized model for each case, we propose to train a once-for-all network that supports diverse architectural settings (depth, width, kernel size, and resolution). Given a deployment scenario, we can later search a specialized sub-network by selecting from the once-for-all network without training. As such, the training cost of specialized models is reduced from O(N) to O(1). However, it's challenging to prevent interference between many sub-networks. Therefore we propose the progressive shrinking algorithm, which is capable of training a once-for-all network to support more than $10^19$ sub-networks while maintaining the same accuracy as independently trained networks, saving the non-recurring engineering (NRE) cost. Extensive experiments on various hardware platforms (Mobile/CPU/GPU) and efficiency constraints show that OFA consistently achieves the same level (or better) ImageNet accuracy than SOTA neural architecture search (NAS) methods. Remarkably, OFA is orders of magnitude faster than NAS in handling multiple deployment scenarios (N). With N=40, OFA requires 14x fewer GPU hours than ProxylessNAS, 16x fewer GPU hours than FBNet and 1,142x fewer GPU hours than MnasNet. The more deployment scenarios, the more savings over NAS.

BibTeX key: cai2019train
entry type: misc
year: 2019
url: http://arxiv.org/abs/1908.09791
note: cite arxiv:1908.09791

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

%0 Generic %1 cai2019train %A Cai, Han %A Gan, Chuang %A Han, Song %D 2019 %K deployment nas %T Once for All: Train One Network and Specialize it for Efficient Deployment %U http://arxiv.org/abs/1908.09791 %X Efficient deployment of deep learning models requires specialized neural network architectures to best fit different hardware platforms and efficiency constraints (defined as deployment scenarios). Traditional approaches either manually design or use AutoML to search a specialized neural network and train it from scratch for each case. It is expensive and unscalable since their training cost is linear w.r.t. the number of deployment scenarios. In this work, we introduce Once for All (OFA) for efficient neural network design to handle many deployment scenarios, a new methodology that decouples model training from architecture search. Instead of training a specialized model for each case, we propose to train a once-for-all network that supports diverse architectural settings (depth, width, kernel size, and resolution). Given a deployment scenario, we can later search a specialized sub-network by selecting from the once-for-all network without training. As such, the training cost of specialized models is reduced from O(N) to O(1). However, it's challenging to prevent interference between many sub-networks. Therefore we propose the progressive shrinking algorithm, which is capable of training a once-for-all network to support more than $10^19$ sub-networks while maintaining the same accuracy as independently trained networks, saving the non-recurring engineering (NRE) cost. Extensive experiments on various hardware platforms (Mobile/CPU/GPU) and efficiency constraints show that OFA consistently achieves the same level (or better) ImageNet accuracy than SOTA neural architecture search (NAS) methods. Remarkably, OFA is orders of magnitude faster than NAS in handling multiple deployment scenarios (N). With N=40, OFA requires 14x fewer GPU hours than ProxylessNAS, 16x fewer GPU hours than FBNet and 1,142x fewer GPU hours than MnasNet. The more deployment scenarios, the more savings over NAS.

@misc{cai2019train, abstract = {Efficient deployment of deep learning models requires specialized neural network architectures to best fit different hardware platforms and efficiency constraints (defined as deployment scenarios). Traditional approaches either manually design or use AutoML to search a specialized neural network and train it from scratch for each case. It is expensive and unscalable since their training cost is linear w.r.t. the number of deployment scenarios. In this work, we introduce Once for All (OFA) for efficient neural network design to handle many deployment scenarios, a new methodology that decouples model training from architecture search. Instead of training a specialized model for each case, we propose to train a once-for-all network that supports diverse architectural settings (depth, width, kernel size, and resolution). Given a deployment scenario, we can later search a specialized sub-network by selecting from the once-for-all network without training. As such, the training cost of specialized models is reduced from O(N) to O(1). However, it's challenging to prevent interference between many sub-networks. Therefore we propose the progressive shrinking algorithm, which is capable of training a once-for-all network to support more than $10^{19}$ sub-networks while maintaining the same accuracy as independently trained networks, saving the non-recurring engineering (NRE) cost. Extensive experiments on various hardware platforms (Mobile/CPU/GPU) and efficiency constraints show that OFA consistently achieves the same level (or better) ImageNet accuracy than SOTA neural architecture search (NAS) methods. Remarkably, OFA is orders of magnitude faster than NAS in handling multiple deployment scenarios (N). With N=40, OFA requires 14x fewer GPU hours than ProxylessNAS, 16x fewer GPU hours than FBNet and 1,142x fewer GPU hours than MnasNet. The more deployment scenarios, the more savings over NAS.}, added-at = {2019-09-04T11:52:44.000+0200}, author = {Cai, Han and Gan, Chuang and Han, Song}, biburl = {https://www.bibsonomy.org/bibtex/2617626f73d2886c148dbfdb94c43c2ec/straybird321}, description = {[1908.09791] Once for All: Train One Network and Specialize it for Efficient Deployment}, interhash = {fcd7220e07790f3a083b1efdd2560012}, intrahash = {617626f73d2886c148dbfdb94c43c2ec}, keywords = {deployment nas}, note = {cite arxiv:1908.09791}, timestamp = {2019-09-04T11:52:44.000+0200}, title = {Once for All: Train One Network and Specialize it for Efficient Deployment}, url = {http://arxiv.org/abs/1908.09791}, year = 2019 }

BibSonomy

Once for All: Train One Network and Specialize it for Efficient Deployment

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on