Characterization of OpenCL on a scalable FPGA architecture
S. Gao, and J. Chritz. ReConFigurable Computing and FPGAs (ReConFig), 2014
International Conference on, page 1--6. (December 2014)
Abstract
The recent release of Altera's SDK for OpenCL has greatly eased
the development of FPGA-based systems. Research have shown
performance improvements brought by OpenCL using a single FPGA
device. However, to meet the objectives of high performance
computing, OpenCL needs to be evaluated using multiple FPGAs.
This work has proposed a scalable FPGA architecture for high
performance computing. The design includes multiple FPGA modules
and a high performance backplane. The modular nature of this
architecture supports the combination of different FPGAs, as
well as provides for easy hardware updates. FPGA modules based
on Stratix V are compatible with Altera's OpenCL tool flow. The
evaluation has tested the native IO performance of the
architecture and the results have demonstrated scalability using
six FPGAs. The host-to-device peak bandwidth is measured as 13.1
GB/s for read operation and 12.1 GB/s for write operation. The
FPGA-to-memory bandwidth is measured as 64.5 GB/s in total. An
OpenCL AES kernel is selected to test the scalable multi-FPGA
architecture. The test results have shown peak throughput is
achiveded when six FPGAs are used. The throughput per watt shows
5× improvement using four FPGAs, over a
general-purpose processor.
%0 Conference Paper
%1 Gao2014-fw
%A Gao, Shanyuan
%A Chritz, J
%B ReConFigurable Computing and FPGAs (ReConFig), 2014
International Conference on
%D 2014
%K Altera_SDK Backplanes Bandwidth Computer_architecture FPGA FPGA-based_system FPGA-to-memory_bandwidth FPGA_module Field_programmable_gate_arrays Hardware_design_languages Kernel OpenCL OpenCL_AES_kernel OpenCL_tool_flow Stratix_V Throughput To_Read field_programmable_gate_arrays general-purpose_processor hardware_update high_performance_backplane high_performance_computing host-to-device_peak_bandwidth memory_architecture multiFPGA_architecture multiple_FPGA native_IO_performance parallel_processing peak_throughput read_operation scalable_FPGA_architecture single_FPGA_device write_operation
%P 1--6
%T Characterization of OpenCL on a scalable FPGA architecture
%X The recent release of Altera's SDK for OpenCL has greatly eased
the development of FPGA-based systems. Research have shown
performance improvements brought by OpenCL using a single FPGA
device. However, to meet the objectives of high performance
computing, OpenCL needs to be evaluated using multiple FPGAs.
This work has proposed a scalable FPGA architecture for high
performance computing. The design includes multiple FPGA modules
and a high performance backplane. The modular nature of this
architecture supports the combination of different FPGAs, as
well as provides for easy hardware updates. FPGA modules based
on Stratix V are compatible with Altera's OpenCL tool flow. The
evaluation has tested the native IO performance of the
architecture and the results have demonstrated scalability using
six FPGAs. The host-to-device peak bandwidth is measured as 13.1
GB/s for read operation and 12.1 GB/s for write operation. The
FPGA-to-memory bandwidth is measured as 64.5 GB/s in total. An
OpenCL AES kernel is selected to test the scalable multi-FPGA
architecture. The test results have shown peak throughput is
achiveded when six FPGAs are used. The throughput per watt shows
5× improvement using four FPGAs, over a
general-purpose processor.
@inproceedings{Gao2014-fw,
abstract = {The recent release of Altera's SDK for OpenCL has greatly eased
the development of FPGA-based systems. Research have shown
performance improvements brought by OpenCL using a single FPGA
device. However, to meet the objectives of high performance
computing, OpenCL needs to be evaluated using multiple FPGAs.
This work has proposed a scalable FPGA architecture for high
performance computing. The design includes multiple FPGA modules
and a high performance backplane. The modular nature of this
architecture supports the combination of different FPGAs, as
well as provides for easy hardware updates. FPGA modules based
on Stratix V are compatible with Altera's OpenCL tool flow. The
evaluation has tested the native IO performance of the
architecture and the results have demonstrated scalability using
six FPGAs. The host-to-device peak bandwidth is measured as 13.1
GB/s for read operation and 12.1 GB/s for write operation. The
FPGA-to-memory bandwidth is measured as 64.5 GB/s in total. An
OpenCL AES kernel is selected to test the scalable multi-FPGA
architecture. The test results have shown peak throughput is
achiveded when six FPGAs are used. The throughput per watt shows
5\texttimes{} improvement using four FPGAs, over a
general-purpose processor.},
added-at = {2015-04-11T18:41:09.000+0200},
author = {Gao, Shanyuan and Chritz, J},
biburl = {https://www.bibsonomy.org/bibtex/221c486e724a5987a33727c0a4e09fafb/christophv},
booktitle = {{ReConFigurable} Computing and {FPGAs} ({ReConFig)}, 2014
International Conference on},
interhash = {ae50cd33a98f2b481c7ec687f0baf123},
intrahash = {21c486e724a5987a33727c0a4e09fafb},
keywords = {Altera_SDK Backplanes Bandwidth Computer_architecture FPGA FPGA-based_system FPGA-to-memory_bandwidth FPGA_module Field_programmable_gate_arrays Hardware_design_languages Kernel OpenCL OpenCL_AES_kernel OpenCL_tool_flow Stratix_V Throughput To_Read field_programmable_gate_arrays general-purpose_processor hardware_update high_performance_backplane high_performance_computing host-to-device_peak_bandwidth memory_architecture multiFPGA_architecture multiple_FPGA native_IO_performance parallel_processing peak_throughput read_operation scalable_FPGA_architecture single_FPGA_device write_operation},
month = dec,
pages = {1--6},
timestamp = {2015-04-11T18:41:09.000+0200},
title = {Characterization of {OpenCL} on a scalable {FPGA} architecture},
year = 2014
}