End-to-End Neural Video Coding Using a Compound Spatiotemporal Representation

被引:15
|
作者
Liu, Haojie [1 ]
Lu, Ming [1 ]
Chen, Zhiqi [2 ]
Cao, Xun [1 ]
Ma, Zhan [1 ]
Wang, Yao [2 ]
机构
[1] Nanjing Univ, Sch Elect Sci & Engn, Nanjing 210093, Jiangsu, Peoples R China
[2] NYU, Tandon Sch Engn, New York, NY 11201 USA
基金
中国国家自然科学基金;
关键词
Image coding; Spatiotemporal phenomena; Decoding; Chemical reactors; Video coding; Feature extraction; Optical flow; Learnt video coding; spatiotemporal recurrent neural network; optical flow; deformable convolutions; video prediction; COMPRESSION;
D O I
10.1109/TCSVT.2022.3150014
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Recent years have witnessed rapid advances in learnt video coding. Most algorithms have solely relied on the vector-based motion representation and resampling (e.g., optical flow based bilinear sampling) for exploiting the inter frame redundancy. In spite of the great success of adaptive kernel-based resampling (e.g., adaptive convolutions and deformable convolutions) in video prediction for uncompressed videos, integrating such approaches with rate-distortion optimization for inter frame coding has been less successful. Recognizing that each resampling solution offers unique advantages in regions with different motion and texture characteristics, we propose a hybrid motion compensation (HMC) method that adaptively combines the predictions generated by these two approaches. Specifically, we generate a compound spatiotemporal representation (CSTR) through a recurrent information aggregation (RIA) module using information from the current and multiple past frames. We further design a one-to-many decoder pipeline to generate multiple predictions from the CSTR, including vector-based resampling, adaptive kernel-based resampling, compensation mode selection maps and texture enhancements, and combines them adaptively to achieve more accurate inter prediction. Experiments show that our proposed inter coding system can provide better motion-compensated prediction and is more robust to occlusions and complex motions. Together with jointly trained intra coder and residual coder, the overall learnt hybrid coder yields the state-of-the-art coding efficiency in low-delay scenario, compared to the traditional H.264/AVC and H.265/HEVC, as well as recently published learning-based methods, in terms of both PSNR and MS-SSIM metrics.
引用
收藏
页码:5650 / 5662
页数:13
相关论文
共 50 条
  • [31] MPNET: An End-to-End Deep Neural Network for Object Detection in Surveillance Video
    Wang, Hanyu
    Wang, Ping
    Qian, Xueming
    IEEE ACCESS, 2018, 6 : 30296 - 30308
  • [32] End-to-end video subtitle recognition via a deep Residual Neural Network
    Yan, Hongyu
    Xu, Xin
    PATTERN RECOGNITION LETTERS, 2020, 131 : 368 - 375
  • [33] An Automated End-To-End Pipeline for Fine-Grained Video Annotation using Deep Neural Networks
    Vandersmissen, Baptist
    Sterckx, Lucas
    Demeester, Thomas
    Jalalyand, Azarakhsh
    De Neye, Wesley
    Van de Walle, Rik
    ICMR'16: PROCEEDINGS OF THE 2016 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2016, : 409 - 412
  • [34] Improving End-to-End Sign Language Translation With Adaptive Video Representation Enhanced Transformer
    Liu, Zidong
    Wu, Jiasong
    Shen, Zeyu
    Chen, Xin
    Wu, Qianyu
    Gui, Zhiguo
    Senhadji, Lotfi
    Shu, Huazhong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (09) : 8327 - 8342
  • [35] End-to-End Exposure Fusion Using Convolutional Neural Network
    Wang, Jinhua
    Wang, Weiqiang
    Xu, Guangmei
    Liu, Hongzhe
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2018, E101D (02): : 560 - 563
  • [36] End-to-End Training for Compound Expression Recognition
    Li, Hongfei
    Li, Qing
    SENSORS, 2020, 20 (17) : 1 - 25
  • [37] End-to-end Spatiotemporal Attention Model for Autonomous Driving
    Zhao, Ruijie
    Zhang, Yanxin
    Huang, Zhiqing
    Yin, Chenkun
    PROCEEDINGS OF 2020 IEEE 4TH INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2020), 2020, : 2649 - 2653
  • [38] End-to-End Transport for Video QoE Fairness
    Nathan, Vikram
    Sivaraman, Vibhaalakshmi
    Addanki, Ravichandra
    Khani, Mehrdad
    Goyal, Prateesh
    Alizadeh, Mohammad
    SIGCOMM '19 - PROCEEDINGS OF THE ACM SPECIAL INTEREST GROUP ON DATA COMMUNICATION, 2019, : 408 - 423
  • [39] End-to-End Video Instance Segmentation with Transformers
    Wang, Yuqing
    Xu, Zhaoliang
    Wang, Xinlong
    Shen, Chunhua
    Cheng, Baoshan
    Shen, Hao
    Xia, Huaxia
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 8737 - 8746
  • [40] End-to-end stereoscopic video streaming system
    Pehlivan, Selen
    Aksay, Anil
    Bilen, Cagdas
    Akar, Gozde Bozdagi
    Civanlar, M. Reha
    2006 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO - ICME 2006, VOLS 1-5, PROCEEDINGS, 2006, : 2169 - 2172