TY - JOUR
T1 - Neural texture transfer assisted video coding with adaptive up-sampling
AU - Yu, Li
AU - Chang, Wenshuai
AU - Quan, Weize
AU - Xiao, Jimin
AU - Yan, Dong Ming
AU - Gabbouj, Moncef
N1 - Publisher Copyright:
© 2022 Elsevier B.V.
PY - 2022/9
Y1 - 2022/9
N2 - Deep learning techniques have been extensively investigated for the purpose of further increasing the efficiency of traditional video compression. Some deep learning techniques for down/up-sampling-based video coding were found to be especially effective when the bandwidth or storage is limited. Existing works mainly differ in the super-resolution models used. Some works simply use a single image super-resolution model, ignoring the rich information in the correlation between video frames, while others explore the correlation between frames by simply concatenating the features across adjacent frames. This, however, may fail when the textures are not well aligned. In this paper, we propose to utilize neural texture transfer which exploits the semantic correlation between frames and is able to explore the correlated information even when the textures are not aligned. Meanwhile, an adaptive group of pictures (GOP) method is proposed to automatically decide whether a frame should be down-sampled or not. Experimental results show that the proposed method outperforms the standard HEVC and state-of-the-art methods under different compression configurations. When compared to standard HEVC, the BD-rate (PSNR) and BD-rate (SSIM) of the proposed method are up to -19.1% and -26.5%, respectively.
AB - Deep learning techniques have been extensively investigated for the purpose of further increasing the efficiency of traditional video compression. Some deep learning techniques for down/up-sampling-based video coding were found to be especially effective when the bandwidth or storage is limited. Existing works mainly differ in the super-resolution models used. Some works simply use a single image super-resolution model, ignoring the rich information in the correlation between video frames, while others explore the correlation between frames by simply concatenating the features across adjacent frames. This, however, may fail when the textures are not well aligned. In this paper, we propose to utilize neural texture transfer which exploits the semantic correlation between frames and is able to explore the correlated information even when the textures are not aligned. Meanwhile, an adaptive group of pictures (GOP) method is proposed to automatically decide whether a frame should be down-sampled or not. Experimental results show that the proposed method outperforms the standard HEVC and state-of-the-art methods under different compression configurations. When compared to standard HEVC, the BD-rate (PSNR) and BD-rate (SSIM) of the proposed method are up to -19.1% and -26.5%, respectively.
KW - Deep learning
KW - High-efficiency video coding (HEVC)
KW - Low bitrate
KW - Machine learning
KW - Reference-based super-resolution
KW - Video compression
UR - http://www.scopus.com/inward/record.url?scp=85131914393&partnerID=8YFLogxK
U2 - 10.1016/j.image.2022.116754
DO - 10.1016/j.image.2022.116754
M3 - Article
AN - SCOPUS:85131914393
SN - 0923-5965
VL - 107
JO - Signal Processing: Image Communication
JF - Signal Processing: Image Communication
M1 - 116754
ER -