Tailored AVX2 Transform Kernels for Versatile Video Coding

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

10 Downloads (Pure)


Transform coding tools play an integral part in video codecs due to their substantial impact on coding efficiency. The latest video coding standard, Versatile Video Coding (VVC), makes the most of these tools by introducing new DST7, DCT8, and non-square transforms alongside the conventional DCT2 transform. This paper proposes optimized AVX2 kernels for all these transforms to speed up VVC coding. Unlike existing solutions, our kernels are specially tailored for each VVC transform type and block size. Accelerating our open-source uvg266 VVC encoder with the proposed kernels yields up to a 1.1× speedup under all intra (AI) coding condition without any coding overhead. Our implementations make forward DCT2 and DST7/DCT8 transforms 4.0× and 6.7× as fast as their respective scalar implementations in the VTM reference encoder. They also outpace the AVX2 kernels of the practical VVenC encoder by factors of 3.0× and 2.8×. The respective speedups rise up to 5.3×, 11.1×, 3.4×, and 3.0× with inverse transforms.
Original languageEnglish
Title of host publication2023 IEEE Nordic Circuits and Systems Conference (NorCAS)
Number of pages6
ISBN (Electronic)979-8-3503-3757-0
ISBN (Print)979-8-3503-3758-7
Publication statusPublished - 31 Oct 2023
Publication typeA4 Article in conference proceedings
EventIEEE Nordic Circuits and Systems Conference - Aalborg, Denmark
Duration: 30 Oct 20231 Nov 2023


ConferenceIEEE Nordic Circuits and Systems Conference


  • versatile video coding (VVC)
  • transform
  • complexity reduction
  • Advanced Vector Extensions 2 (AVX2)
  • practical encoder implementation

Publication forum classification

  • Publication forum level 1


Dive into the research topics of 'Tailored AVX2 Transform Kernels for Versatile Video Coding'. Together they form a unique fingerprint.

Cite this