End-to-End Learning for Video Frame Compression with Self-Attention

被引：3

作者：

Zou, Nannan ^{[2
]}

Zhang, Honglei ^{[1
]}

Cricri, Francesco ^{[1
]}

Tavakoli, Hamed R. ^{[1
]}

Lainema, Jani ^{[1
]}

Aksu, Emre ^{[1
]}

Hannuksela, Miska ^{[1
]}

Rahtu, Esa ^{[2
]}

机构：

[1] Nokia Technol, Espoo, Finland

[2] Tampere Univ, Tampere, Finland

来源：

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020) | 2020年

关键词：

D O I：

10.1109/CVPRW50498.2020.00079

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

One of the core components of conventional (i.e., non-learned) video codecs consists of predicting a frame from a previously-decoded frame, by leveraging temporal correlations. In this paper, we propose an end-to-end learned system for compressing video frames. Instead of relying on pixel-space motion (as with optical flow), our system learns deep embeddings of frames and encodes their difference in latent space. At decoder-side, an attention mechanism is designed to attend to the latent space of frames to decide how different parts of the previous and current frame are combined to form the final predicted current frame. Spatially-varying channel allocation is achieved by using importance masks acting on the feature-channels. The model is trained to reduce the bitrate by minimizing a loss on importance maps and a loss on the probability output by a context model for arithmetic coding. In our experiments, we show that the proposed system achieves high compression rates and high objective visual quality as measured by MS-SSIM and PSNR. Furthermore, we provide ablation studies where we highlight the contribution of different components.

引用

页码：580 / 584

页数：5

共 50 条

[1] End-to-End ASR with Adaptive Span Self-Attention
Chang, Xuankai
Subramanian, Aswin Shanmugam
Guo, Pengcheng
Watanabe, Shinji
Fujita, Yuya
Omachi, Motoi
INTERSPEECH 2020, 2020, : 3595 - 3599
[2] END-TO-END NEURAL SPEAKER DIARIZATION WITH SELF-ATTENTION
Fujita, Yusuke
Kanda, Naoyuki
Horiguchi, Shota
Xue, Yawen
Nagamatsu, Kenji
Watanabe, Shinji
2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 296 - 303
[3] Self-Attention Transducers for End-to-End Speech Recognition
Tian, Zhengkun
Yi, Jiangyan
Tao, Jianhua
Bai, Ye
Wen, Zhengqi
INTERSPEECH 2019, 2019, : 4395 - 4399
[4] An End-to-End Learning Framework for Video Compression
Lu, Guo
Zhang, Xiaoyun
Ouyang, Wanli
Chen, Li
Gao, Zhiyong
Xu, Dong
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (10) : 3292 - 3308
[5] END-TO-END SPEECH SUMMARIZATION USING RESTRICTED SELF-ATTENTION
Sharma, Roshan
Palaskar, Shruti
Black, Alan W.
Metze, Florian
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8072 - 8076
[6] Efficient decoding self-attention for end-to-end speech synthesis
Zhao, Wei
Xu, Li
FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2022, 23 (07) : 1127 - 1138
[7] On the localness modeling for the self-attention based end-to-end speech synthesis
Yang, Shan
Lu, Heng
Kang, Shiyin
Xue, Liumeng
Xiao, Jinba
Su, Dan
Xie, Lei
Yu, Dong
Neural Networks, 2020, 125 : 121 - 130
[8] Very Deep Self-Attention Networks for End-to-End Speech Recognition
Ngoc-Quan Pham
Thai-Son Nguyen
Niehues, Jan
Mueller, Markus
Waibel, Alex
INTERSPEECH 2019, 2019, : 66 - 70
[9] End-to-end Parking Behavior Recognition Based on Self-attention Mechanism
Li, Penghua
Zhu, Dechen
Mou, Qiyun
Tu, Yushan
Wu, Jinfeng
2023 2ND ASIA CONFERENCE ON ALGORITHMS, COMPUTING AND MACHINE LEARNING, CACML 2023, 2023, : 371 - 376
[10] On the localness modeling for the self-attention based end-to-end speech synthesis
Yang, Shan
Lu, Heng
Kang, Shiyin
Xue, Liumeng
Xiao, Jinba
Su, Dan
Xie, Lei
Yu, Dong
NEURAL NETWORKS, 2020, 125 : 121 - 130

← 1 2 3 4 5 →