NN-VVC: A Hybrid Learned-Conventional Video Codec Targeting Humans and Machines

Jukka I. Ahonen, Nam Le, Honglei Zhang, Antti Hallapuro, Francesco Cricri, Hamed Rezazadegan Tavakoli, Miska M. Hannuksela, Esa Rahtu

Tutkimustuotos: ArtikkeliTieteellinenvertaisarvioitu

Abstrakti

Advancements in artificial intelligence have significantly increased the use of images and videos in machine analysis algorithms, predominantly neural networks. However, the traditional methods of compressing, storing and transmitting media have been optimized for human viewers rather than machines. Current research in coding images and videos for machine analysis has evolved in two distinct paths. The first is characterized by End-to-End (E2E) learned codes, which show promising results in image coding but have yet to match the performance of leading Conventional Video Codecs (CVC) and suffer from a lack of interoperability. The second path optimizes CVC, such as the Versatile Video Coding (VVC) standard, for machine-oriented reconstruction. Although CVC-based approaches enjoy widespread hardware and software compatibility and interoperability, they often fall short in machine task performance, especially at lower bitrates. This paper proposes a novel hybrid codec for machines named NN-VVC, which combines the advantages of an E2E-learned image codec and a CVC to achieve high performance in both image and video coding for machines. Our experiments show that the proposed system achieved up to - 43.20% and - 26.8% Bjøntegaard Delta rate reduction over VVC for image and video data, respectively, when evaluated on multiple different datasets and machine vision tasks according to the common test conditions designed by the VCM study group in MPEG standardization activities. Furthermore, to improve reconstruction quality, we introduce a human-focused branch into our codec, enhancing the visual appeal of reconstructions intended for human supervision of the machine-oriented main branch.

AlkuperäiskieliEnglanti
Sivut689-712
Sivumäärä24
JulkaisuINTERNATIONAL JOURNAL OF SEMANTIC COMPUTING
Vuosikerta18
Numero4
DOI - pysyväislinkit
TilaJulkaistu - 2024
OKM-julkaisutyyppiA1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä

Julkaisufoorumi-taso

  • Jufo-taso 1

!!ASJC Scopus subject areas

  • Software
  • Information Systems
  • Linguistics and Language
  • Computer Science Applications
  • Computer Networks and Communications
  • Artificial Intelligence

Sormenjälki

Sukella tutkimusaiheisiin 'NN-VVC: A Hybrid Learned-Conventional Video Codec Targeting Humans and Machines'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

Siteeraa tätä