Region of Interest Enabled Learned Image Coding for Machines

Jukka I. Ahonen, Nam Le, Honglei Zhang, Francesco Cricri, Esa Rahtu

Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

Abstract

Image and video coding for machines has been recently gaining more and more interest from both the industry and the research community. One successful approach is based on end-to-end (E2E) learned compression and has shown significant gains over the state-of-the-art conventional image coding methods. However, one of the remaining challenges for such E2E-learned image codecs for machines is to adaptively allocate the bits over different regions of the image, while retaining the machine vision performance. In this paper, we propose a method that leverages Regions-Of-Interest (ROIs) for bitrate allocation within a Learned Image Codec (LIC) for machines. In particular, the proposed method reduces the bits allocated for the background regions of the image by reducing the variance of the elements corresponding to the background regions in the latent representation. This results in more heavily quantized background areas, while keeping the quality of the ROI areas suitable for machine tasks. The proposed method achieves significant gains, -15.80% and -22.43% Pareto BD-rate reduction, over the baseline LIC on object detection and instance segmentation tasks, respectively. To the best of our knowledge, this is the first research paper proposing an ROI-based inference-time technology for Learned Image Coding for machines.

Original languageEnglish
Title of host publication2023 IEEE 25th International Workshop on Multimedia Signal Processing (MMSP)
PublisherIEEE
Pages1-6
ISBN (Electronic)9798350338935
DOIs
Publication statusPublished - 2023
Publication typeA4 Article in conference proceedings
EventIEEE International Workshop on Multimedia Signal Processing (MMSP) - Poitiers, France
Duration: 27 Sept 202329 Sept 2023

Publication series

NameIEEE International Workshop on Multimedia Signal Processing
PublisherIEEE
ISSN (Electronic)2473-3628

Conference

ConferenceIEEE International Workshop on Multimedia Signal Processing (MMSP)
Country/TerritoryFrance
CityPoitiers
Period27/09/2329/09/23

Keywords

  • learned image coding
  • machine vision
  • neural networks
  • region of interest
  • video coding for machines

Publication forum classification

  • Publication forum level 1

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition
  • Signal Processing
  • Media Technology

Fingerprint

Dive into the research topics of 'Region of Interest Enabled Learned Image Coding for Machines'. Together they form a unique fingerprint.

Cite this