End-to-End Learning for Video Frame Compression with Self-Attention

被引:3
|
作者
Zou, Nannan [2 ]
Zhang, Honglei [1 ]
Cricri, Francesco [1 ]
Tavakoli, Hamed R. [1 ]
Lainema, Jani [1 ]
Aksu, Emre [1 ]
Hannuksela, Miska [1 ]
Rahtu, Esa [2 ]
机构
[1] Nokia Technol, Espoo, Finland
[2] Tampere Univ, Tampere, Finland
关键词
D O I
10.1109/CVPRW50498.2020.00079
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
One of the core components of conventional (i.e., non-learned) video codecs consists of predicting a frame from a previously-decoded frame, by leveraging temporal correlations. In this paper, we propose an end-to-end learned system for compressing video frames. Instead of relying on pixel-space motion (as with optical flow), our system learns deep embeddings of frames and encodes their difference in latent space. At decoder-side, an attention mechanism is designed to attend to the latent space of frames to decide how different parts of the previous and current frame are combined to form the final predicted current frame. Spatially-varying channel allocation is achieved by using importance masks acting on the feature-channels. The model is trained to reduce the bitrate by minimizing a loss on importance maps and a loss on the probability output by a context model for arithmetic coding. In our experiments, we show that the proposed system achieves high compression rates and high objective visual quality as measured by MS-SSIM and PSNR. Furthermore, we provide ablation studies where we highlight the contribution of different components.
引用
收藏
页码:580 / 584
页数:5
相关论文
共 50 条
  • [21] IMPROVING MANDARIN END-TO-END SPEECH SYNTHESIS BY SELF-ATTENTION AND LEARNABLE GAUSSIAN BIAS
    Yang, Fengyu
    Yang, Shan
    Zhu, Pengcheng
    Yan, Pengju
    Xie, Lei
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 208 - 213
  • [22] Adversarial joint training with self-attention mechanism for robust end-to-end speech recognition
    Lujun Li
    Yikai Kang
    Yuchen Shi
    Ludwig Kürzinger
    Tobias Watzel
    Gerhard Rigoll
    EURASIP Journal on Audio, Speech, and Music Processing, 2021
  • [23] SAN-M: Memory Equipped Self-Attention for End-to-End Speech Recognition
    Gao, Zhifu
    Zhang, Shiliang
    Lei, Ming
    McLoughlin, Ian
    INTERSPEECH 2020, 2020, : 6 - 10
  • [24] An End-to-end Topic-Enhanced Self-Attention Network for Social Emotion Classification
    Wang, Chang
    Wang, Bang
    WEB CONFERENCE 2020: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2020), 2020, : 2210 - 2219
  • [25] End-to-End Learning of Video Compression Using Spatio-Temporal Autoencoders
    Pessoa, Jorge
    Aidos, Helena
    Tomas, Pedro
    Figueiredo, Mario A. T.
    2020 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS), 2020, : 276 - 281
  • [26] Learning-based End-to-End Video Compression Using Predictive Coding
    de Oliveira, Matheus C.
    Martins, Luiz G. R.
    Jung, Henrique Costa
    Guerin Jr, Nilson Donizete
    da Silva, Renam Castro
    Peixoto, Eduardo
    Macchiavello, Bruno
    Hung, Edson M.
    Testoni, Vanessa
    Freitas, Pedro Garcia
    2021 34TH SIBGRAPI CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI 2021), 2021, : 160 - 167
  • [27] An End-to-End Blind Image Quality Assessment Method Using a Recurrent Network and Self-Attention
    Zhou, Mingliang
    Lan, Xuting
    Wei, Xuekai
    Liao, Xingran
    Mao, Qin
    Li, Yutong
    Wu, Chao
    Xiang, Tao
    Fang, Bin
    IEEE TRANSACTIONS ON BROADCASTING, 2023, 69 (02) : 369 - 377
  • [28] Reinforcement-Tracking: An End-to-End Trajectory Tracking Method Based on Self-Attention Mechanism
    Zhao, Guanglei
    Chen, Zihao
    Liao, Weiming
    INTERNATIONAL JOURNAL OF AUTOMOTIVE TECHNOLOGY, 2024, 25 (03) : 541 - 551
  • [29] Reinforcement-Tracking: An End-to-End Trajectory Tracking Method Based on Self-Attention Mechanism
    Guanglei Zhao
    Zihao Chen
    Weiming Liao
    International Journal of Automotive Technology, 2024, 25 : 541 - 551
  • [30] Dual Causal/Non-Causal Self-Attention for Streaming End-to-End Speech Recognition
    Moritz, Niko
    Hori, Takaaki
    Le Roux, Jonathan
    INTERSPEECH 2021, 2021, : 1822 - 1826