TY - GEN
T1 - TTA-SIMD Soft Core Processors
AU - Tervo, Kati
AU - Malik, Samawat
AU - Leppänen, Topi
AU - Jääskeläinen, Pekka
N1 - JUFOID=73463
Funding Information:
This work is part of the FitOptiVis project [1] funded by the ECSEL Joint Undertaking under grant number H2020-ECSEL-2017-2-783162.
Publisher Copyright:
© 2020 IEEE.
Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2020
Y1 - 2020
N2 - Soft processors are an important tool in the Field Programmable Gate Array (FPGA) designer's toolkit, and their Single Instruction Multiple Data (SIMD) organizations are an efficient means to utilize the parallelism of FPGAs. However, the state-of-the-art SIMD processors are hindered by the additional logic complexity resulting from dynamic features. By minimizing such constructs, it is possible to design soft processors that are efficient but still flexible enough to operate within an application domain. To this end, we propose a family of instruction set programmable multi-issue wide SIMD soft cores. The template is based on a highly static Transport Triggered Architecture (TTA) and a design time customizable shuffle unit to minimize inefficient dynamic features while remaining compiler programmable. The cores are evaluated on the PYNQ-Z1 board against the ARM A9 hard processor system with NEON vector extensions. The proposed cores reach up to 2.4x performance improvement over the ARM, can fit up to 1024 bit wide SIMD units onto the relatively small FPGA, while still operating at above 100 MHz. The scalability of TTA enables state of the art vector widths. The multicore scalability of the template is preliminarily tested with a 14-core design on a XCZU9EG FPGA customized for real-time convolutional neural net inference.
AB - Soft processors are an important tool in the Field Programmable Gate Array (FPGA) designer's toolkit, and their Single Instruction Multiple Data (SIMD) organizations are an efficient means to utilize the parallelism of FPGAs. However, the state-of-the-art SIMD processors are hindered by the additional logic complexity resulting from dynamic features. By minimizing such constructs, it is possible to design soft processors that are efficient but still flexible enough to operate within an application domain. To this end, we propose a family of instruction set programmable multi-issue wide SIMD soft cores. The template is based on a highly static Transport Triggered Architecture (TTA) and a design time customizable shuffle unit to minimize inefficient dynamic features while remaining compiler programmable. The cores are evaluated on the PYNQ-Z1 board against the ARM A9 hard processor system with NEON vector extensions. The proposed cores reach up to 2.4x performance improvement over the ARM, can fit up to 1024 bit wide SIMD units onto the relatively small FPGA, while still operating at above 100 MHz. The scalability of TTA enables state of the art vector widths. The multicore scalability of the template is preliminarily tested with a 14-core design on a XCZU9EG FPGA customized for real-time convolutional neural net inference.
KW - Field-Programmable Gate Array (FPGA)
KW - Single Instruction Multiple Data (SIMD)
KW - Transport-Triggered Architecture (TTA)
U2 - 10.1109/FPL50879.2020.00023
DO - 10.1109/FPL50879.2020.00023
M3 - Conference contribution
AN - SCOPUS:85095584597
SN - 978-1-7281-9903-0
T3 - International Conference on Field Programmable Logic and Applications
SP - 79
EP - 84
BT - Proceedings - 30th International Conference on Field-Programmable Logic and Applications, FPL 2020
A2 - Mentens, Nele
A2 - Sousa, Leonel
A2 - Trancoso, Pedro
A2 - Pericas, Miquel
A2 - Sourdis, Ioannis
PB - IEEE
T2 - International Conference on Field-Programmable Logic and Applications
Y2 - 31 August 2020 through 4 September 2020
ER -