Artikel in einem Konferenzbericht,

Auto-tuning a high-level language targeted to GPU codes

S. Grauer-Gray, L. Xu, R. Searles, S. Ayalasomayajula, und J. Cavazos.
Innovative Parallel Computing (InPar), 2012, Seite 1--10. (Mai 2012)

Zusammenfassung

Determining the best set of optimizations to apply to a kernel to be executed on the graphics processing unit (GPU) is a challenging problem. There are large sets of possible optimization configurations that can be applied, and many applications have multiple kernels. Each kernel may require a specific configuration to achieve the best performance, and moving an application to new hardware often requires a new optimization configuration for each kernel. In this work, we apply optimizations to GPU code using HMPP, a high-level directive-based language and source-to-source compiler that can generate CUDA / OpenCL code. However, programming with high-level languages may mean a loss of performance compared to using low-level languages. Our work shows that it is possible to improve the performance of a high-level language by using auto-tuning. We perform auto-tuning on a large optimization space on GPU kernels, focusing on loop permutation, loop unrolling, tiling, and specifying which loop(s) to parallelize, and show results on convolution kernels, codes in the PolyBench suite, and an implementation of belief propagation for stereo vision. The results show that our auto-tuned HMPP-generated implementations are significantly faster than the default HMPP implementation and can meet or exceed the performance of manually coded CUDA / OpenCL implementations.

BibTeX-Schlüssel: Grauer-Gray2012-hn
Eintragstyp: inproceedings
Buchtitel: Innovative Parallel Computing (InPar), 2012
Jahr: 2012
Monat: may
Seiten: 1--10

Nutzer

Kommentare und Rezensionenanzeigen / verbergen

Bitte melden Sie sich an um selbst Rezensionen oder Kommentare zu erstellen.

Zitieren Sie diese Publikation

%0 Conference Paper %1 Grauer-Gray2012-hn %A Grauer-Gray, S %A Xu, Lifan %A Searles, R %A Ayalasomayajula, S %A Cavazos, J %B Innovative Parallel Computing (InPar), 2012 %D 2012 %K Abstracts Auto-tuning Belief_Propagation Benchmark_testing CUDA DSL Expose GPU GPU_codes Graphics_processing_unit HMPP Nickel OpenCL OpenCL_code Optimization PolyBench_suite Programming Tiles autotuning belief_propagation convolution_kernels graphics_processing_unit graphics_processing_units high-level_directive-based_language high_level_languages hybrid_multicore_parallel_programming loop_permutation loop_tiling loop_unrolling multiprocessing_systems optimization_configuration parallel_architectures parallel_programming program_compilers source-to-source_compiler stereo_image_processing stereo_vision %P 1--10 %T Auto-tuning a high-level language targeted to GPU codes %X Determining the best set of optimizations to apply to a kernel to be executed on the graphics processing unit (GPU) is a challenging problem. There are large sets of possible optimization configurations that can be applied, and many applications have multiple kernels. Each kernel may require a specific configuration to achieve the best performance, and moving an application to new hardware often requires a new optimization configuration for each kernel. In this work, we apply optimizations to GPU code using HMPP, a high-level directive-based language and source-to-source compiler that can generate CUDA / OpenCL code. However, programming with high-level languages may mean a loss of performance compared to using low-level languages. Our work shows that it is possible to improve the performance of a high-level language by using auto-tuning. We perform auto-tuning on a large optimization space on GPU kernels, focusing on loop permutation, loop unrolling, tiling, and specifying which loop(s) to parallelize, and show results on convolution kernels, codes in the PolyBench suite, and an implementation of belief propagation for stereo vision. The results show that our auto-tuned HMPP-generated implementations are significantly faster than the default HMPP implementation and can meet or exceed the performance of manually coded CUDA / OpenCL implementations.

@inproceedings{Grauer-Gray2012-hn, abstract = {Determining the best set of optimizations to apply to a kernel to be executed on the graphics processing unit (GPU) is a challenging problem. There are large sets of possible optimization configurations that can be applied, and many applications have multiple kernels. Each kernel may require a specific configuration to achieve the best performance, and moving an application to new hardware often requires a new optimization configuration for each kernel. In this work, we apply optimizations to GPU code using HMPP, a high-level directive-based language and source-to-source compiler that can generate CUDA / OpenCL code. However, programming with high-level languages may mean a loss of performance compared to using low-level languages. Our work shows that it is possible to improve the performance of a high-level language by using auto-tuning. We perform auto-tuning on a large optimization space on GPU kernels, focusing on loop permutation, loop unrolling, tiling, and specifying which loop(s) to parallelize, and show results on convolution kernels, codes in the PolyBench suite, and an implementation of belief propagation for stereo vision. The results show that our auto-tuned HMPP-generated implementations are significantly faster than the default HMPP implementation and can meet or exceed the performance of manually coded CUDA / OpenCL implementations.}, added-at = {2015-06-15T23:11:41.000+0200}, author = {Grauer-Gray, S and Xu, Lifan and Searles, R and Ayalasomayajula, S and Cavazos, J}, biburl = {https://www.bibsonomy.org/bibtex/28aa2932d8917b3b0ca0863e851550221/christophv}, booktitle = {Innovative Parallel Computing ({InPar)}, 2012}, interhash = {9e836eefc96374a3b61de4f2c459fe8a}, intrahash = {8aa2932d8917b3b0ca0863e851550221}, keywords = {Abstracts Auto-tuning Belief_Propagation Benchmark_testing CUDA DSL Expose GPU GPU_codes Graphics_processing_unit HMPP Nickel OpenCL OpenCL_code Optimization PolyBench_suite Programming Tiles autotuning belief_propagation convolution_kernels graphics_processing_unit graphics_processing_units high-level_directive-based_language high_level_languages hybrid_multicore_parallel_programming loop_permutation loop_tiling loop_unrolling multiprocessing_systems optimization_configuration parallel_architectures parallel_programming program_compilers source-to-source_compiler stereo_image_processing stereo_vision}, month = may, pages = {1--10}, timestamp = {2016-01-04T14:22:08.000+0100}, title = {Auto-tuning a high-level language targeted to {GPU} codes}, year = 2012 }

BibSonomy

Auto-tuning a high-level language targeted to GPU codes

Zusammenfassung

Tags

Nutzer

Kommentare und Rezensionenanzeigen / verbergen

Zitieren Sie diese Publikation

Mehr Zitationsstile

Suchen auf