TY - GEN
T1 - Region of Interest Enabled Learned Image Coding for Machines
AU - Ahonen, Jukka I.
AU - Le, Nam
AU - Zhang, Honglei
AU - Cricri, Francesco
AU - Rahtu, Esa
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Image and video coding for machines has been recently gaining more and more interest from both the industry and the research community. One successful approach is based on end-to-end (E2E) learned compression and has shown significant gains over the state-of-the-art conventional image coding methods. However, one of the remaining challenges for such E2E-learned image codecs for machines is to adaptively allocate the bits over different regions of the image, while retaining the machine vision performance. In this paper, we propose a method that leverages Regions-Of-Interest (ROIs) for bitrate allocation within a Learned Image Codec (LIC) for machines. In particular, the proposed method reduces the bits allocated for the background regions of the image by reducing the variance of the elements corresponding to the background regions in the latent representation. This results in more heavily quantized background areas, while keeping the quality of the ROI areas suitable for machine tasks. The proposed method achieves significant gains, -15.80% and -22.43% Pareto BD-rate reduction, over the baseline LIC on object detection and instance segmentation tasks, respectively. To the best of our knowledge, this is the first research paper proposing an ROI-based inference-time technology for Learned Image Coding for machines.
AB - Image and video coding for machines has been recently gaining more and more interest from both the industry and the research community. One successful approach is based on end-to-end (E2E) learned compression and has shown significant gains over the state-of-the-art conventional image coding methods. However, one of the remaining challenges for such E2E-learned image codecs for machines is to adaptively allocate the bits over different regions of the image, while retaining the machine vision performance. In this paper, we propose a method that leverages Regions-Of-Interest (ROIs) for bitrate allocation within a Learned Image Codec (LIC) for machines. In particular, the proposed method reduces the bits allocated for the background regions of the image by reducing the variance of the elements corresponding to the background regions in the latent representation. This results in more heavily quantized background areas, while keeping the quality of the ROI areas suitable for machine tasks. The proposed method achieves significant gains, -15.80% and -22.43% Pareto BD-rate reduction, over the baseline LIC on object detection and instance segmentation tasks, respectively. To the best of our knowledge, this is the first research paper proposing an ROI-based inference-time technology for Learned Image Coding for machines.
KW - learned image coding
KW - machine vision
KW - neural networks
KW - region of interest
KW - video coding for machines
U2 - 10.1109/MMSP59012.2023.10337731
DO - 10.1109/MMSP59012.2023.10337731
M3 - Conference contribution
AN - SCOPUS:85181582805
T3 - IEEE International Workshop on Multimedia Signal Processing
SP - 1
EP - 6
BT - 2023 IEEE 25th International Workshop on Multimedia Signal Processing (MMSP)
PB - IEEE
T2 - IEEE International Workshop on Multimedia Signal Processing (MMSP)
Y2 - 27 September 2023 through 29 September 2023
ER -