@imsl3s

Implementation and analysis of the histograms of oriented Gradients algorithm on a heterogeneous multicore CPU/GPU architecture

, , and . 2015 IEEE Global Conference on Signal and Information Processing (GlobalSIP), page 1402-1406. (December 2015)
DOI: 10.1109/GlobalSIP.2015.7418429

Abstract

Due to the integration of multiple heterogeneous processing units on a single die, programmers can make use of processors with various features. For instance, the Samsung Exynos 5 Octa mobile processor features two ARM CPU clusters (Cortex-A7/A15), a mobile Mali GPU, and a dedicated image processor (codec). However, data transfer delays and missing data coherencies between clusters complicate heterogeneous programming. Programs should not only scale over multiple cores, but also distribute the work over heterogeneous processing units. Depending on the algorithm and platform characteristics, the selection of a proper partitioning scheme appears as a challenging task. In this work, we present a heterogeneous implementation of the Histograms of Oriented Gradients algorithm as a case study, which is a key algorithm in the field of driver assistance systems. The implementation is targeted on the CPU-clusters and the GPU of the Samsung Exynos 5 Octa 5422. In order to generate the best partitioning scheme, we specifically discuss different strategies. Therefore, we analyze the computational capabilities as well as the power consumption of the individual processing units using different algorithmic processing stages. We show that a GPU-only execution slows down the computation compared with the CPU-only version, while mapping to both devices (CPU and GPU) achieves a speedup of 1.68.

Description

IEEE Xplore Abstract - Implementation and analysis of the histograms of oriented Gradients algorithm on a heterogeneous mul...

Links and resources

Tags

community

  • @dblp
  • @imsl3s
@imsl3s's tags highlighted