Abstract
Sophisticated computational imaging algorithms require both high performance and good energy-efficiency when executed on mobile devices. Recent trend has been to exploit the abundant data-level parallelism found in general purpose programmable GPUs. However, for low-power mobile use cases, generic GPUs consume excessive amounts of power. This paper proposes a programmable computational imaging processor with 16-bit half-precision SIMD floating point vector processing capabilities combined with power efficiency of an exposed datapath. In comparison to traditional VLIW architectures with similar computational resources, the exposed datapath reduces the register file traffic and complexity. These and the specific optimizations enabled by the explicit programming model enable extremely good power-performance. When synthesized on a 28nm ASIC technology, the accelerator consumes 71mW of power while running a state-of-the-art denoising algorithm, and occupies only 0.2mm² of chip area. For the algorithm, energy usage per frame is 7mJ, which is 10x less than the best found GPU-based implementation.
Original language | English |
---|---|
Title of host publication | 2016 IEEE Nordic Circuits and Systems Conference (NORCAS) |
Publisher | IEEE |
Number of pages | 6 |
ISBN (Electronic) | 978-1-5090-1095-0 |
DOIs | |
Publication status | Published - 2016 |
Publication type | A4 Article in conference proceedings |
Event | Nordic circuits and systems conference - Duration: 1 Jan 2000 → … |
Conference
Conference | Nordic circuits and systems conference |
---|---|
Period | 1/01/00 → … |
Publication forum classification
- Publication forum level 1
ASJC Scopus subject areas
- Hardware and Architecture
- Signal Processing