A Transformer based deep conditional video compression

被引:0
|
作者
Lu G. [1 ]
Zhong T. [1 ]
Geng J. [1 ]
机构
[1] School of Computer Science and Engineering, Beijing Institute of Technology, Beijing
基金
中国国家自然科学基金;
关键词
compression algorithm; deep learning; neural network; Transformer; video compression;
D O I
10.13700/j.bh.1001-5965.2022.0374
中图分类号
学科分类号
摘要
Convolutional neural networks (CNN) are the foundation of most recent learning-based video compression algorithms, which also use residual coding and motion compensation architectures. It is difficult to attain the best compression performance given that typical CNN can only use local correlations and the sparsity of prediction residual. To solve the problems above, this paper proposed a Transformer-based deep conditional video compression algorithm, which can achieve better compression performance. The proposed algorithm uses deformable convolution to obtain the predicted frame feature based on the motion information between the front and rear frames. The predicted frame feature is used as conditional information to conditionally encode the original input frame feature which avoids the direct encoding of sparse residual signals. The proposed algorithm further utilizes the non-local correlation between the features and proposes a transformer-based autoencoder architecture to implement motion coding and conditional coding, which further improves the performance of compression. Experiments show that our Transformer based deep conditional video compression algorithm surpasses the current mainstream learning-based video compression algorithms in both HEVC and UVG datasets. © 2024 Beijing University of Aeronautics and Astronautics (BUAA). All rights reserved.
引用
收藏
页码:442 / 448
页数:6
相关论文
共 29 条
  • [1] Advanced video coding:ISO/IEC 14496-10, (2003)
  • [2] BROSS B., High efficiency video coding:ISO/IEC 23008-2, (2012)
  • [3] LU G, OUYANG W L, XU D, Et al., DVC: An end-to-end deep video compression framework, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10998-11007, (2020)
  • [4] WALLACE G K., The JPEG still picture compression standard, Communications of the ACM, 34, 4, pp. 30-44, (1991)
  • [5] BELLARD F., BPG image format
  • [6] TODERICI G, O'MALLEY S M, HWANG S J, Et al., Variable rate image compression with recurrent neural networks
  • [7] TODERICI G, VINCENT D, JOHNSTON N, Et al., Full resolution image compression with recurrent neural networks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5435-5443, (2017)
  • [8] JOHNSTON N, VINCENT D, MINNEN D, Et al., Improved lossy image compression with priming and spatially adaptive bit rates for recurrent networks, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4385-4393, (2018)
  • [9] BALLE J, LAPARRA V, SIMONCELLI E P., End-to-end optimized image compression
  • [10] BALLE J, MINNEN D, SINGH S, Et al., Variational image compression with a scale hyperprior