Abstract

Convolutional neural networks (CNNs) have been demonstrated to be a successful approach in the field of artificial intelligence (AI). Deploying CNNs on embedded devices at a large scale would contribute significantly to the advancement and practical implementation of AI in various industries. However, the complexity of CNNs in terms of memory and operation requirements poses challenges in terms of computing performance, memory bandwidth, and flexibility of the executing hardware. This paper introduces a framework that addresses these issues through model quantization and hardware acceleration on a scalable vertical vector processor architecture. Firstly, the framework includes a method for layer fusion, which is designed to optimize the hardware utilization. Secondly, data storage is optimized to enhance memory efficiency. Lastly, CNNs are mapped onto the vertical vector processing concept of the hardware accelerator. The effectiveness of the proposed framework is evaluated by analyzing the accelerator efficiency based on a field-programmable gate array (FPGA). The results demonstrate that the framework offers flexibility, configurability, and efficient mapping for typical CNN implementations. The framework achieves up to 84% of the peak performance of the vector processor for the VGG net.

Links and resources

Tags