Inproceedings,

Exploiting Subword Permutations to Maximize CNN Compute Performance and Efficiency

M. Beyer, S. Gesper, A. Guntoro, G. Paya-Vaya, and H. Blume.
Proceedings - 2023 IEEE 34th International Conference on Application-Specific Systems, Architectures and Processors, ASAP 2023, page 61--68. United States, Institute of Electrical and Electronics Engineers Inc., (2023)Funding Information: This work is supported by the German federal ministry of education and research (BMBF), project ZuSE-KI-AVF (grant no. 16ME0062).; 34th IEEE International Conference on Application-Specific Systems, Architectures and Processors, ASAP 2023 ; Conference date: 19-07-2023 Through 21-07-2023.
DOI: 10.1109/ASAP57973.2023.00023

Abstract

Neural networks (NNs) are quantized to decrease their computational demands and reduce their memory foot-print. However, specialized hardware is required that supports computations with low bit widths to take advantage of such optimizations. In this work, we propose permutations on subword level that build on top of multi-bit-width multiply-accumulate operations to effectively support low bit width computations of quantized NNs. By applying this technique, we extend the data reuse and further improve compute performance for convolution operations compared to simple vectorization using SIMD (single-instruction-multiple-data). We perform a design space exploration using a cycle accurate simulation with MobileNet and VGG16 on a vector-based processor. The results show a speedup of up to 3.7 × and a reduction of up to 1.9 × for required data transfers. Additionally, the control overhead for orchestrating the computation is decreased by up to 3.9 ×.

BibTeX key: 652b927dcaba485face632adffdbac25
entry type: inproceedings
address: United States
booktitle: Proceedings - 2023 IEEE 34th International Conference on Application-Specific Systems, Architectures and Processors, ASAP 2023
year: 2023
pages: 61--68
publisher: Institute of Electrical and Electronics Engineers Inc.
series: Proceedings of the International Conference on Application-Specific Systems, Architectures and Processors
isbn: 979-8-3503-4686-2
language: English
DOI: 10.1109/ASAP57973.2023.00023
note: Funding Information: This work is supported by the German federal ministry of education and research (BMBF), project ZuSE-KI-AVF (grant no. 16ME0062).; 34th IEEE International Conference on Application-Specific Systems, Architectures and Processors, ASAP 2023 ; Conference date: 19-07-2023 Through 21-07-2023

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

@inproceedings{652b927dcaba485face632adffdbac25, abstract = {Neural networks (NNs) are quantized to decrease their computational demands and reduce their memory foot-print. However, specialized hardware is required that supports computations with low bit widths to take advantage of such optimizations. In this work, we propose permutations on subword level that build on top of multi-bit-width multiply-accumulate operations to effectively support low bit width computations of quantized NNs. By applying this technique, we extend the data reuse and further improve compute performance for convolution operations compared to simple vectorization using SIMD (single-instruction-multiple-data). We perform a design space exploration using a cycle accurate simulation with MobileNet and VGG16 on a vector-based processor. The results show a speedup of up to 3.7 × and a reduction of up to 1.9 × for required data transfers. Additionally, the control overhead for orchestrating the computation is decreased by up to 3.9 ×.}, added-at = {2024-02-05T16:12:22.000+0100}, address = {United States}, author = {Beyer, Michael and Gesper, Sven and Guntoro, Andre and Paya-Vaya, Guillermo and Blume, Holger}, biburl = {https://www.bibsonomy.org/bibtex/25370328590846ac4b3dc5ab9f4993025/fabcho}, booktitle = {Proceedings - 2023 IEEE 34th International Conference on Application-Specific Systems, Architectures and Processors, ASAP 2023}, doi = {10.1109/ASAP57973.2023.00023}, interhash = {ad8d78822966b5aaf28b0e55b948f128}, intrahash = {5370328590846ac4b3dc5ab9f4993025}, isbn = {979-8-3503-4686-2}, keywords = {Application-Specific CNN, Hardware, Network Neural Permutation Processor, Subword myown}, language = {English}, note = {Funding Information: This work is supported by the German federal ministry of education and research (BMBF), project ZuSE-KI-AVF (grant no. 16ME0062).; 34th IEEE International Conference on Application-Specific Systems, Architectures and Processors, ASAP 2023 ; Conference date: 19-07-2023 Through 21-07-2023}, pages = {61--68}, publisher = {Institute of Electrical and Electronics Engineers Inc.}, series = {Proceedings of the International Conference on Application-Specific Systems, Architectures and Processors}, timestamp = {2024-03-05T15:41:56.000+0100}, title = {Exploiting Subword Permutations to Maximize CNN Compute Performance and Efficiency}, year = 2023 }

BibSonomy

Exploiting Subword Permutations to Maximize CNN Compute Performance and Efficiency

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on