End-to-End Neural Video Coding Using a Compound Spatiotemporal Representation

被引:15
|
作者
Liu, Haojie [1 ]
Lu, Ming [1 ]
Chen, Zhiqi [2 ]
Cao, Xun [1 ]
Ma, Zhan [1 ]
Wang, Yao [2 ]
机构
[1] Nanjing Univ, Sch Elect Sci & Engn, Nanjing 210093, Jiangsu, Peoples R China
[2] NYU, Tandon Sch Engn, New York, NY 11201 USA
基金
中国国家自然科学基金;
关键词
Image coding; Spatiotemporal phenomena; Decoding; Chemical reactors; Video coding; Feature extraction; Optical flow; Learnt video coding; spatiotemporal recurrent neural network; optical flow; deformable convolutions; video prediction; COMPRESSION;
D O I
10.1109/TCSVT.2022.3150014
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Recent years have witnessed rapid advances in learnt video coding. Most algorithms have solely relied on the vector-based motion representation and resampling (e.g., optical flow based bilinear sampling) for exploiting the inter frame redundancy. In spite of the great success of adaptive kernel-based resampling (e.g., adaptive convolutions and deformable convolutions) in video prediction for uncompressed videos, integrating such approaches with rate-distortion optimization for inter frame coding has been less successful. Recognizing that each resampling solution offers unique advantages in regions with different motion and texture characteristics, we propose a hybrid motion compensation (HMC) method that adaptively combines the predictions generated by these two approaches. Specifically, we generate a compound spatiotemporal representation (CSTR) through a recurrent information aggregation (RIA) module using information from the current and multiple past frames. We further design a one-to-many decoder pipeline to generate multiple predictions from the CSTR, including vector-based resampling, adaptive kernel-based resampling, compensation mode selection maps and texture enhancements, and combines them adaptively to achieve more accurate inter prediction. Experiments show that our proposed inter coding system can provide better motion-compensated prediction and is more robust to occlusions and complex motions. Together with jointly trained intra coder and residual coder, the overall learnt hybrid coder yields the state-of-the-art coding efficiency in low-delay scenario, compared to the traditional H.264/AVC and H.265/HEVC, as well as recently published learning-based methods, in terms of both PSNR and MS-SSIM metrics.
引用
收藏
页码:5650 / 5662
页数:13
相关论文
共 50 条
  • [41] End-to-End Video Text Spotting with Transformer
    Wu, Weijia
    Cai, Yuanqiang
    Shen, Chunhua
    Zhang, Debing
    Fu, Ying
    Zhou, Hong
    Luo, Ping
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (09) : 4019 - 4035
  • [42] End-to-end stereoscopic video streaming system
    Pehlivan, Selen
    Aksay, Anil
    Bilen, Cagdas
    Akar, Gozde Bozdagi
    Civanlar, M. Reha
    2006 IEEE 14TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS, VOLS 1 AND 2, 2006, : 932 - +
  • [43] Stereoscopic Video Streaming with End-to-End Modeling
    Tan, A. Serdar
    Aksay, Anil
    Akar, Goezde Bozdagi
    Arikan, Erdal
    2008 IEEE 16TH SIGNAL PROCESSING, COMMUNICATION AND APPLICATIONS CONFERENCE, VOLS 1 AND 2, 2008, : 541 - +
  • [44] Comprehensive Review of End-to-End Video Compression
    Shi, Liangfan
    Lu, Huimin
    20TH INTERNATIONAL WIRELESS COMMUNICATIONS & MOBILE COMPUTING CONFERENCE, IWCMC 2024, 2024, : 43 - 48
  • [45] An End-to-End Learning Framework for Video Compression
    Lu, Guo
    Zhang, Xiaoyun
    Ouyang, Wanli
    Chen, Li
    Gao, Zhiyong
    Xu, Dong
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (10) : 3292 - 3308
  • [46] Modeling of SSIM-based end-to-end distortion for error-resilient video coding
    Qiang Peng
    Lei Zhang
    Xiao Wu
    Qionghua Wang
    EURASIP Journal on Image and Video Processing, 2014
  • [47] Modeling of SSIM-based end-to-end distortion for error-resilient video coding
    Peng, Qiang
    Zhang, Lei
    Wu, Xiao
    Wang, Qionghua
    EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2014,
  • [48] MPAI-EEV: Standardization Efforts of Artificial Intelligence Based End-to-End Video Coding
    Jia, Chuanmin
    Ye, Feng
    Dong, Fanke
    Lin, Kai
    Chiariglione, Leonardo
    Ma, Siwei
    Sun, Huifang
    Gao, Wen
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (05) : 3096 - 3110
  • [49] A UNIFIED FRAMEWORK FOR SPECTRAL DOMAIN PREDICTION AND END-TO-END DISTORTION ESTIMATION IN SCALABLE VIDEO CODING
    Han, Jingning
    Melkote, Vinay
    Rose, Kenneth
    2011 18TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2011,
  • [50] End-to-end Video Matting with Trimap Propagation
    Huang, Wei-Lun
    Lee, Ming-Sui
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 14337 - 14347