End-to-End Learning for Video Frame Compression with Self-Attention

被引:3
|
作者
Zou, Nannan [2 ]
Zhang, Honglei [1 ]
Cricri, Francesco [1 ]
Tavakoli, Hamed R. [1 ]
Lainema, Jani [1 ]
Aksu, Emre [1 ]
Hannuksela, Miska [1 ]
Rahtu, Esa [2 ]
机构
[1] Nokia Technol, Espoo, Finland
[2] Tampere Univ, Tampere, Finland
关键词
D O I
10.1109/CVPRW50498.2020.00079
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
One of the core components of conventional (i.e., non-learned) video codecs consists of predicting a frame from a previously-decoded frame, by leveraging temporal correlations. In this paper, we propose an end-to-end learned system for compressing video frames. Instead of relying on pixel-space motion (as with optical flow), our system learns deep embeddings of frames and encodes their difference in latent space. At decoder-side, an attention mechanism is designed to attend to the latent space of frames to decide how different parts of the previous and current frame are combined to form the final predicted current frame. Spatially-varying channel allocation is achieved by using importance masks acting on the feature-channels. The model is trained to reduce the bitrate by minimizing a loss on importance maps and a loss on the probability output by a context model for arithmetic coding. In our experiments, we show that the proposed system achieves high compression rates and high objective visual quality as measured by MS-SSIM and PSNR. Furthermore, we provide ablation studies where we highlight the contribution of different components.
引用
收藏
页码:580 / 584
页数:5
相关论文
共 50 条
  • [41] CASA-Net: Cross-attention and Self-attention for End-to-End Audio-visual Speaker Diarization
    Zhou, Haodong
    Li, Tao
    Wang, Jie
    Li, Lin
    Hong, Qingyang
    2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 102 - 106
  • [42] Transformer-based end-to-end speech recognition with residual Gaussian-based self-attention
    Liang, Chengdong
    Xu, Menglong
    Zhang, Xiao-Lei
    Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2021, 2 : 1495 - 1499
  • [43] Self-Attention Channel Combinator Frontend for End-to-End Multichannel Far-field Speech Recognition
    Gong, Rong
    Quillen, Carl
    Sharma, Dushyant
    Goderre, Andrew
    Lainez, Jose
    Milanovic, Ljubomir
    INTERSPEECH 2021, 2021, : 3840 - 3844
  • [44] A Novel End-to-end Network Based on a bidirectional GRU and a Self-Attention Mechanism for Denoising of Electroencephalography Signals
    Wang, Wenlong
    Li, Baojiang
    Wang, Haiyan
    NEUROSCIENCE, 2022, 505 : 10 - 20
  • [45] Self-Distillation into Self-Attention Heads for Improving Transformer-based End-to-End Neural Speaker Diarization
    Jeoung, Ye-Rin
    Choi, Jeong-Hwan
    Seong, Ju-Seok
    Kyung, JeHyun
    Chang, Joon-Hyuk
    INTERSPEECH 2023, 2023, : 3197 - 3201
  • [46] SWINBERT: End-to-End Transformers with Sparse Attention for Video Captioning
    Lin, Kevin
    Li, Linjie
    Lin, Chung-Ching
    Ahmed, Faisal
    Gan, Zhe
    Liu, Zicheng
    Lu, Yumao
    Wang, Lijuan
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 17928 - 17937
  • [47] Attention Based End-to-End Network for Short Video Classification
    Zhu, Hui
    Zou, Chao
    Wang, Zhenyu
    Xu, Kai
    Huang, Zihao
    2022 18TH INTERNATIONAL CONFERENCE ON MOBILITY, SENSING AND NETWORKING, MSN, 2022, : 490 - 494
  • [48] End-to-End Multi-Task Learning with Attention
    Liu, Shikun
    Johns, Edward
    Davison, Andrew J.
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 1871 - 1880
  • [49] Learning End-to-End Lossy Image Compression: A Benchmark
    Hu, Yueyu
    Yang, Wenhan
    Ma, Zhan
    Liu, Jiaying
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (08) : 4194 - 4211
  • [50] End-to-end frame-rate adaptive streaming of video data
    Fung, CW
    Liew, SC
    IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA COMPUTING AND SYSTEMS, PROCEEDINGS VOL 2, 1999, : 67 - 71