TY - BOOK
T1 - From Parallel Programs to Customized Parallel Processors
AU - Jääskeläinen, Pekka
N1 - Awarding institution:Tampereen teknillinen yliopisto - Tampere University of Technology<br/>Submitter:Submitted by Pekka Jääskeläinen ([email protected]) on 2012-11-12T10:02:31Z
No. of bitstreams: 1
Jaaskelainen.pdf: 1192725 bytes, checksum: 9bd3265db1a50431fdda18c1ad942940 (MD5)<br/>Submitter:Approved for entry into archive by Kaisa Kulkki([email protected]) on 2012-11-13T10:16:59Z (GMT) No. of bitstreams: 1
Jaaskelainen.pdf: 1192725 bytes, checksum: 9bd3265db1a50431fdda18c1ad942940 (MD5)<br/>Submitter:Made available in DSpace on 2012-11-13T10:16:59Z (GMT). No. of bitstreams: 1
Jaaskelainen.pdf: 1192725 bytes, checksum: 9bd3265db1a50431fdda18c1ad942940 (MD5)
PY - 2012/11/8
Y1 - 2012/11/8
N2 - The need for fast time to market of new embedded processor-based designs calls for a rapid design methodology of the included processors. The call for such a methodology is even more emphasized in the context of so called soft cores targeted to reconfigurable fabrics where per-design processor customization is commonplace.
The C language has been commonly used as an input to hardware/software co-design flows. However, as C is a sequential language, its potential to generate parallel operations to utilize naturally parallel hardware constructs is far from optimal, leading to a customized processor design space with limited parallel resource scalability. In contrast, when utilizing a parallel programming language as an input, a wider processor design space can be explored to produce customized processors with varying degrees of utilized parallelism.
This Thesis proposes a novel Multicore Application-Specific Instruction Set Processor (MCASIP) co-design methodology that exploits parallel programming languages as the application input format. In the methodology, the designer can explicitly capture the parallelism of the algorithm and exploit specialized instructions using a parallel programming language in contrast to being on the mercy of the compiler or the hardware to extract the parallelism from a sequential input. The Thesis proposes a multicore processor template based on the Transport Triggered Architecture, compiler techniques involved in static parallelization of computation kernels with barriers and a datapath integrated hardware accelerator for low overhead software synchronization implementation. These contributions enable scaling the customized processors both at the instruction and task levels to efficiently exploit the parallelism in the input program up to the implementation constraints such as the memory bandwidth or the chip area. The different contributions are validated with case studies, comparisons and design examples.
AB - The need for fast time to market of new embedded processor-based designs calls for a rapid design methodology of the included processors. The call for such a methodology is even more emphasized in the context of so called soft cores targeted to reconfigurable fabrics where per-design processor customization is commonplace.
The C language has been commonly used as an input to hardware/software co-design flows. However, as C is a sequential language, its potential to generate parallel operations to utilize naturally parallel hardware constructs is far from optimal, leading to a customized processor design space with limited parallel resource scalability. In contrast, when utilizing a parallel programming language as an input, a wider processor design space can be explored to produce customized processors with varying degrees of utilized parallelism.
This Thesis proposes a novel Multicore Application-Specific Instruction Set Processor (MCASIP) co-design methodology that exploits parallel programming languages as the application input format. In the methodology, the designer can explicitly capture the parallelism of the algorithm and exploit specialized instructions using a parallel programming language in contrast to being on the mercy of the compiler or the hardware to extract the parallelism from a sequential input. The Thesis proposes a multicore processor template based on the Transport Triggered Architecture, compiler techniques involved in static parallelization of computation kernels with barriers and a datapath integrated hardware accelerator for low overhead software synchronization implementation. These contributions enable scaling the customized processors both at the instruction and task levels to efficiently exploit the parallelism in the input program up to the implementation constraints such as the memory bandwidth or the chip area. The different contributions are validated with case studies, comparisons and design examples.
M3 - Doctoral thesis
SN - 978-952-15-2932-0
T3 - Tampere University of Technology. Publication
BT - From Parallel Programs to Customized Parallel Processors
PB - Tampere University of Technology
ER -