TY - JOUR
T1 - Neural texture transfer assisted video coding with adaptive up-sampling
AU - Yu, Li
AU - Chang, Wenshuai
AU - Quan, Weize
AU - Xiao, Jimin
AU - Yan, Dong-Ming
AU - Gabbouj, Moncef
N1 - Funding Information:
This work was supported in part by the National Natural Science Foundation of China under Grant 62002172 , and Grant 61972323 ; and in part by the Natural Science Foundation of the Jiangsu Higher Education Institutions of China under Grant 19KJB510040 ; and in part by the Nanjing Scientific Innovation Foundation for the Returned Overseas Chinese Scholars under Grant R2019LZ04 ; and in part by the Jiangsu Provincial Double-Innovation Doctor Program ; and in part by the Open Project Program of the National Laboratory of Pattern Recognition (NLPR), China under Grant 202100002 ; and in part by the Startup Foundation for Introducing Talent of NUIST, China under Grant 2018r080 .We acknowledge the High Performane Computing Center of Nanjing University of Information Science & Technology for their support of this work.
Funding Information:
This work was supported in part by the National Natural Science Foundation of China under Grant 62002172, and Grant 61972323; and in part by the Natural Science Foundation of the Jiangsu Higher Education Institutions of China under Grant 19KJB510040; and in part by the Nanjing Scientific Innovation Foundation for the Returned Overseas Chinese Scholars under Grant R2019LZ04; and in part by the Jiangsu Provincial Double-Innovation Doctor Program; and in part by the Open Project Program of the National Laboratory of Pattern Recognition (NLPR), China under Grant 202100002; and in part by the Startup Foundation for Introducing Talent of NUIST, China under Grant 2018r080.We acknowledge the High Performane Computing Center of Nanjing University of Information Science & Technology for their support of this work.
Publisher Copyright:
© 2022 Elsevier B.V.
PY - 2022/9
Y1 - 2022/9
N2 - Deep learning techniques have been extensively investigated for the purpose of further increasing the efficiency of traditional video compression. Some deep learning techniques for down/up-sampling-based video coding were found to be especially effective when the bandwidth or storage is limited. Existing works mainly differ in the super-resolution models used. Some works simply use a single image super-resolution model, ignoring the rich information in the correlation between video frames, while others explore the correlation between frames by simply concatenating the features across adjacent frames. This, however, may fail when the textures are not well aligned. In this paper, we propose to utilize neural texture transfer which exploits the semantic correlation between frames and is able to explore the correlated information even when the textures are not aligned. Meanwhile, an adaptive group of pictures (GOP) method is proposed to automatically decide whether a frame should be down-sampled or not. Experimental results show that the proposed method outperforms the standard HEVC and state-of-the-art methods under different compression configurations. When compared to standard HEVC, the BD-rate (PSNR) and BD-rate (SSIM) of the proposed method are up to -19.1% and -26.5%, respectively.
AB - Deep learning techniques have been extensively investigated for the purpose of further increasing the efficiency of traditional video compression. Some deep learning techniques for down/up-sampling-based video coding were found to be especially effective when the bandwidth or storage is limited. Existing works mainly differ in the super-resolution models used. Some works simply use a single image super-resolution model, ignoring the rich information in the correlation between video frames, while others explore the correlation between frames by simply concatenating the features across adjacent frames. This, however, may fail when the textures are not well aligned. In this paper, we propose to utilize neural texture transfer which exploits the semantic correlation between frames and is able to explore the correlated information even when the textures are not aligned. Meanwhile, an adaptive group of pictures (GOP) method is proposed to automatically decide whether a frame should be down-sampled or not. Experimental results show that the proposed method outperforms the standard HEVC and state-of-the-art methods under different compression configurations. When compared to standard HEVC, the BD-rate (PSNR) and BD-rate (SSIM) of the proposed method are up to -19.1% and -26.5%, respectively.
KW - Deep learning
KW - High-efficiency video coding (HEVC)
KW - Low bitrate
KW - Machine learning
KW - Reference-based super-resolution
KW - Video compression
U2 - 10.1016/j.image.2022.116754
DO - 10.1016/j.image.2022.116754
M3 - Article
AN - SCOPUS:85131914393
SN - 0923-5965
VL - 107
JO - Signal Processing: Image Communication
JF - Signal Processing: Image Communication
M1 - 116754
ER -