Lightweight Multitask Learning for Robust JND Prediction using Latent Space and Reconstructed Frames

Sanaz Nami, Farhad Pakdaman, Mahmoud Reza Hashemi, Shervin Shirmohammadi, Moncef Gabbouj

Research output: Contribution to journalArticleScientificpeer-review

1 Downloads (Pure)

Abstract

The Just Noticeable Difference (JND) refers to the smallest distortion in an image or video that can be perceived by Human Visual System (HVS), and is widely used in optimizing image/video compression. However, accurate JND modeling is very challenging due to its content dependence, and the complex nature of the HVS. Recent solutions train deep learning based JND prediction models, mainly based on a Quantization Parameter (QP) value, representing a single JND level, and train separate models to predict each JND level. We point out that a single QP-distance is insufficient to properly train a network with millions of parameters, for a complex content-dependent task. Inspired by recent advances in learned compression and multitask learning, we propose to address this problem by (1) learning to reconstruct the JND-quality frames, jointly with the QP prediction, and (2) jointly learning several JND levels to augment the learning performance. We propose a novel solution where first, an effective feature backbone is trained by learning to reconstruct JND-quality frames from the raw frames. Second, JND prediction models are trained based on features extracted from latent space (i.e., compressed domain), or reconstructed JND-quality frames. Third, a multi-JND model is designed, which jointly learns three JND levels, further reducing the prediction error. Extensive experimental results demonstrate that our multi-JND method outperforms the state-of-the-art and achieves an average JND1 prediction error of only 1.57 in QP, and 0.72 dB in PSNR. Moreover, the multitask learning approach, and compressed domain prediction facilitate light-weight inference by significantly reducing the complexity and the number of parameters.

Original languageEnglish
Pages (from-to)1
Number of pages1
JournalIEEE Transactions on Circuits and Systems for Video Technology
DOIs
Publication statusE-pub ahead of print - 2024
Publication typeA1 Journal article-refereed

Keywords

  • Compressed Domain
  • Distortion
  • Frequency-domain analysis
  • Human Visual System (HVS)
  • Image reconstruction
  • Just Noticeable Difference (JND)
  • Multitask Learning
  • Predictive models
  • Streaming media
  • Training
  • Visualization

Publication forum classification

  • Publication forum level 2

ASJC Scopus subject areas

  • Media Technology
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Lightweight Multitask Learning for Robust JND Prediction using Latent Space and Reconstructed Frames'. Together they form a unique fingerprint.

Cite this