Shedding the Bits: Pushing the Boundaries of Quantization with Minifloats on FPGAs

Shivam Aggarwal, Hans Jakob Damsgaard, Alessandro Pappalardo, Giuseppe Franco, Thomas B. Preußer, Michaela Blott, Tulika Mitra

Tutkimustuotos: KonferenssiartikkeliTieteellinenvertaisarvioitu

2 Sitaatiot (Scopus)

Abstrakti

Post-training quantization (PTQ) is a powerful technique for model compression, reducing the numerical precision in neural networks without additional training overhead. Recent works have investigated adopting 8 -bit floating-point formats (FP8) in the context of PTQ for model inference. However, floating-point formats smaller than 8 bits and their relative comparison in terms of accuracy-hardware cost with integers remains unexplored on FPGAs. In this work, we present minifloats, which are reduced-precision floating-point formats capable of further reducing the memory footprint, latency, and energy cost of a model while approaching full-precision model accuracy. We implement a custom FPGA-based multiply-accumulate operator library and explore the vast design space, comparing minifloat and integer representations across 3 to 8 bits for both weights and activations. We also examine the applicability of various integer-based quantization techniques to minifloats. Our experiments show that minifloats offer a promising alternative for emerging workloads such as vision transformers.

AlkuperäiskieliEnglanti
OtsikkoProceedings - 2024 34th International Conference on Field-Programmable Logic and Applications, FPL 2024
KustantajaIEEE
Sivut297-303
Sivumäärä7
ISBN (elektroninen)979-8-3315-3007-5
DOI - pysyväislinkit
TilaJulkaistu - 2024
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisussa
TapahtumaInternational Conference on Field-Programmable Logic and Applications - Torino, Italia
Kesto: 2 syysk. 20246 syysk. 2024

Julkaisusarja

Nimi
ISSN (elektroninen)1946-1488

Conference

ConferenceInternational Conference on Field-Programmable Logic and Applications
Maa/AlueItalia
KaupunkiTorino
Ajanjakso2/09/246/09/24

Julkaisufoorumi-taso

  • Jufo-taso 1

!!ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Networks and Communications
  • Computer Science Applications
  • Hardware and Architecture
  • Software

Sormenjälki

Sukella tutkimusaiheisiin 'Shedding the Bits: Pushing the Boundaries of Quantization with Minifloats on FPGAs'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

Siteeraa tätä