End-to-End Neural Video Coding Using a Compound Spatiotemporal Representation

被引：15

作者：

Liu, Haojie ^{[1
]}

Lu, Ming ^{[1
]}

Chen, Zhiqi ^{[2
]}

Cao, Xun ^{[1
]}

Ma, Zhan ^{[1
]}

Wang, Yao ^{[2
]}

机构：

[1] Nanjing Univ, Sch Elect Sci & Engn, Nanjing 210093, Jiangsu, Peoples R China

[2] NYU, Tandon Sch Engn, New York, NY 11201 USA

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2022年 / 32卷 / 08期

基金：

中国国家自然科学基金;

关键词：

Image coding; Spatiotemporal phenomena; Decoding; Chemical reactors; Video coding; Feature extraction; Optical flow; Learnt video coding; spatiotemporal recurrent neural network; optical flow; deformable convolutions; video prediction; COMPRESSION;

D O I：

10.1109/TCSVT.2022.3150014

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Recent years have witnessed rapid advances in learnt video coding. Most algorithms have solely relied on the vector-based motion representation and resampling (e.g., optical flow based bilinear sampling) for exploiting the inter frame redundancy. In spite of the great success of adaptive kernel-based resampling (e.g., adaptive convolutions and deformable convolutions) in video prediction for uncompressed videos, integrating such approaches with rate-distortion optimization for inter frame coding has been less successful. Recognizing that each resampling solution offers unique advantages in regions with different motion and texture characteristics, we propose a hybrid motion compensation (HMC) method that adaptively combines the predictions generated by these two approaches. Specifically, we generate a compound spatiotemporal representation (CSTR) through a recurrent information aggregation (RIA) module using information from the current and multiple past frames. We further design a one-to-many decoder pipeline to generate multiple predictions from the CSTR, including vector-based resampling, adaptive kernel-based resampling, compensation mode selection maps and texture enhancements, and combines them adaptively to achieve more accurate inter prediction. Experiments show that our proposed inter coding system can provide better motion-compensated prediction and is more robust to occlusions and complex motions. Together with jointly trained intra coder and residual coder, the overall learnt hybrid coder yields the state-of-the-art coding efficiency in low-delay scenario, compared to the traditional H.264/AVC and H.265/HEVC, as well as recently published learning-based methods, in terms of both PSNR and MS-SSIM metrics.

引用

页码：5650 / 5662

页数：13

共 50 条

[41] End-to-End Video Text Spotting with Transformer
Wu, Weijia
Cai, Yuanqiang
Shen, Chunhua
Zhang, Debing
Fu, Ying
Zhou, Hong
Luo, Ping
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (09) : 4019 - 4035
[42] End-to-end stereoscopic video streaming system
Pehlivan, Selen
Aksay, Anil
Bilen, Cagdas
Akar, Gozde Bozdagi
Civanlar, M. Reha
2006 IEEE 14TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS, VOLS 1 AND 2, 2006, : 932 - +
[43] Stereoscopic Video Streaming with End-to-End Modeling
Tan, A. Serdar
Aksay, Anil
Akar, Goezde Bozdagi
Arikan, Erdal
2008 IEEE 16TH SIGNAL PROCESSING, COMMUNICATION AND APPLICATIONS CONFERENCE, VOLS 1 AND 2, 2008, : 541 - +
[44] Comprehensive Review of End-to-End Video Compression
Shi, Liangfan
Lu, Huimin
20TH INTERNATIONAL WIRELESS COMMUNICATIONS & MOBILE COMPUTING CONFERENCE, IWCMC 2024, 2024, : 43 - 48
[45] An End-to-End Learning Framework for Video Compression
Lu, Guo
Zhang, Xiaoyun
Ouyang, Wanli
Chen, Li
Gao, Zhiyong
Xu, Dong
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (10) : 3292 - 3308
[46] Modeling of SSIM-based end-to-end distortion for error-resilient video coding
Qiang Peng
Lei Zhang
Xiao Wu
Qionghua Wang
EURASIP Journal on Image and Video Processing, 2014
[47] Modeling of SSIM-based end-to-end distortion for error-resilient video coding
Peng, Qiang
Zhang, Lei
Wu, Xiao
Wang, Qionghua
EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2014,
[48] MPAI-EEV: Standardization Efforts of Artificial Intelligence Based End-to-End Video Coding
Jia, Chuanmin
Ye, Feng
Dong, Fanke
Lin, Kai
Chiariglione, Leonardo
Ma, Siwei
Sun, Huifang
Gao, Wen
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (05) : 3096 - 3110
[49] A UNIFIED FRAMEWORK FOR SPECTRAL DOMAIN PREDICTION AND END-TO-END DISTORTION ESTIMATION IN SCALABLE VIDEO CODING
Han, Jingning
Melkote, Vinay
Rose, Kenneth
2011 18TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2011,
[50] End-to-end Video Matting with Trimap Propagation
Huang, Wei-Lun
Lee, Ming-Sui
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 14337 - 14347

← 1 2 3 4 5 →