Abstract
Accelerator processors allow energy-efficient computation at
high performance, especially for computationintensive
applications. There exists a plethora of different accelerator
architectures, such as GPUs and the Cell Broadband Engine. Each
accelerator has its own programming language, but the recently
introduced OpenCL language unifies accelerator programming
languages. Hereby, OpenCL achieves functional protability,
allowing to reduce the development time of kernels. Functional
portability however has limited value without performance
portability: the possibility to re-use optimized kernels with
good performance. This paper investigates the specificity of
code optimizations to accelerator architecture and the severity
of lack of performance portability.
Users
Please
log in to take part in the discussion (add own reviews or comments).