End-to-End Transformer for Compressed Video Quality Enhancement

Li Yu, Wenshuai Chang, Shiyu Wu, Moncef Gabbouj

Research output: Contribution to journalArticleScientificpeer-review

2 Citations (Scopus)
12 Downloads (Pure)


Convolutional neural networks have achieved excellent results in compressed video quality enhancement task in recent years. State-of-the-art methods explore the spatio-temporal information of adjacent frames mainly by deformable convolution. However, the CNN-based methods can only exploit local information, thus lacking the exploration of global information. Moreover, current methods enhance the video quality at a single scale, ignoring the multi-scale information, which corresponds to information at different receptive fields and is crucial for correlation modeling. Therefore, in this work, we propose a Transformer-based compressed video quality enhancement ( TVQE ) method, consisting of Transformer based Spatio-Temporal feature Fusion ( TSTF ) module and Multi-scale Channel-wise Attention based Quality Enhancement ( MCQE ) module. The proposed TSTF module learns both local and global features for correlation modeling, in which window-based Transformer and the encoder-decoder structure greatly improve the execution efficiency. The proposed MCQE module calculates the multi-scale channel attention, which aggregates the temporal information between channels in the feature map at multiple scales, achieving efficient fusion of inter-frame information. Extensive experiments on the JCT-VT test sequences show that the proposed method increases PSNR by up to 0.98 dB when QP = 37. Meanwhile, the inference speed is improved by up to 9.4%, and the number of Flops is reduced by up to 84.4% compared to competing methods at 720p resolution. Moreover, the proposed method achieves the BD-rate reduction up to 23.04%.

Original languageEnglish
Number of pages11
JournalIEEE Transactions on Broadcasting
Publication statusE-pub ahead of print - 29 Nov 2023
Publication typeA1 Journal article-refereed


  • Compressed video quality enhancement
  • Correlation
  • deep learning
  • Image coding
  • Optical imaging
  • Streaming media
  • Task analysis
  • transformer
  • Transformers
  • video compression
  • Video recording

Publication forum classification

  • Publication forum level 1

ASJC Scopus subject areas

  • Media Technology
  • Electrical and Electronic Engineering


Dive into the research topics of 'End-to-End Transformer for Compressed Video Quality Enhancement'. Together they form a unique fingerprint.

Cite this