TY - JOUR
T1 - Efficient OpenCL system integration of non-blocking FPGA accelerators
AU - Leppänen, Topi
AU - Lotvonen, Atro
AU - Mousouliotis, Panagiotis
AU - Multanen, Joonas
AU - Keramidas, Georgios
AU - Jääskeläinen, Pekka
N1 - Funding Information:
The work for this publication was supported by European Union’s Horizon 2020 research and innovation programme under Grant Agreement No. 871738 (CPSoSaware) and Academy of Finland (decision #331344 ). We would also like to thank Xilinx for donating the Alveo FPGA and its related software used in this work and HSA Foundation for the financial support and the useful specification work.
Funding Information:
The work for this publication was supported by European Union's Horizon 2020 research and innovation programme under Grant Agreement No. 871738 (CPSoSaware) and Academy of Finland (decision #331344). We would also like to thank Xilinx for donating the Alveo FPGA and its related software used in this work and HSA Foundation for the financial support and the useful specification work.
Publisher Copyright:
© 2023 The Author(s)
PY - 2023/3
Y1 - 2023/3
N2 - OpenCL functions as a portability layer for diverse heterogeneous hardware platforms including CPUs, GPUs, FPGAs, and hardware accelerators. However, OpenCL programs utilizing multiple of these devices in the same computing platform suffer from poor coordination between OpenCL implementations of different hardware vendors. This paper proposes a vendor-independent open source method for integrating custom FPGA accelerators into a common OpenCL platform. The accelerators are wrapped in a common hardware interface to enable efficient synchronization and data sharing between devices on the same chip. The provided software connects the accelerator to OpenCL runtime and enables the control of diverse FPGA accelerators with OpenCL command queues. The benefits of the integration methodology are demonstrated by creating FPGA accelerators with different development tools and integrating them together on two different types of FPGA devices while showing minimal integration overhead. Direct memory access of the accelerator to external memory is shown to increase the performance by a factor of 8. Non-blocking execution enabled by the on-chip synchronization between devices is shown to remove a 250 μs overhead from dependent kernel launches. Additionally, as a proof of concept and a case study, a fully OpenCL-controllable computing platform with two devices is implemented on an FPGA to compute CNN inference on a real-world input signal.
AB - OpenCL functions as a portability layer for diverse heterogeneous hardware platforms including CPUs, GPUs, FPGAs, and hardware accelerators. However, OpenCL programs utilizing multiple of these devices in the same computing platform suffer from poor coordination between OpenCL implementations of different hardware vendors. This paper proposes a vendor-independent open source method for integrating custom FPGA accelerators into a common OpenCL platform. The accelerators are wrapped in a common hardware interface to enable efficient synchronization and data sharing between devices on the same chip. The provided software connects the accelerator to OpenCL runtime and enables the control of diverse FPGA accelerators with OpenCL command queues. The benefits of the integration methodology are demonstrated by creating FPGA accelerators with different development tools and integrating them together on two different types of FPGA devices while showing minimal integration overhead. Direct memory access of the accelerator to external memory is shown to increase the performance by a factor of 8. Non-blocking execution enabled by the on-chip synchronization between devices is shown to remove a 250 μs overhead from dependent kernel launches. Additionally, as a proof of concept and a case study, a fully OpenCL-controllable computing platform with two devices is implemented on an FPGA to compute CNN inference on a real-world input signal.
KW - FPGA
KW - Hardware acceleration
KW - Hardware integration
KW - Heterogeneous computing
KW - OpenCL
U2 - 10.1016/j.micpro.2023.104772
DO - 10.1016/j.micpro.2023.104772
M3 - Article
AN - SCOPUS:85146435215
SN - 0141-9331
VL - 97
JO - Microprocessors and Microsystems
JF - Microprocessors and Microsystems
M1 - 104772
ER -