Neural Video Compression with Spatio-Temporal Cross-Covariance Transformers

被引：3

作者：

Chen, Zhenghao ^{[1
,3
]}

Relic, Lucas ^{[2
]}

Azevedo, Roberto ^{[3
]}

Zhang, Yang ^{[3
]}

Gross, Markus ^{[2
]}

Xu, Dong ^{[4
]}

Zhou, Luping ^{[1
]}

Schroers, Christopher ^{[3
]}

机构：

[1] Univ Sydney, Sydney, NSW, Australia

[2] Swiss Fed Inst Technol, Zurich, Switzerland

[3] DisneyRes Studios, Zurich, Switzerland

[4] Univ Hong Kong, Hong Kong, Peoples R China

来源：

PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023 | 2023年

关键词：

Video compression; neural network; transformer;

D O I：

10.1145/3581783.3611960

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Although existing neural video compression (NVC) methods have achieved significant success, most of them focus on improving either temporal or spatial information separately. They generally use simple operations such as concatenation or subtraction to utilize this information, while such operations only partially exploit spatio-temporal redundancies. This work aims to effectively and jointly leverage robust temporal and spatial information by proposing a new 3D-based transformer module: Spatio-Temporal Cross-Covariance Transformer (ST-XCT). The ST-XCT module combines two individual extracted features into a joint spatio-temporal feature, followed by 3D convolutional operations and a novel spatio-temporal-aware cross-covariance attention mechanism. Unlike conventional transformers, the cross-covariance attention mechanism is applied across the feature channels without breaking down the spatio-temporal features into local tokens. Such design allows for modeling global cross-channel correlations of the spatio-temporal context while lowering the computational requirement. Based on ST-XCT, we introduce a novel transformer-based end-to-end optimized NVC framework. ST-XCT-based modules are integrated into various key coding components of NVC, such as feature extraction, frame reconstruction, and entropy modeling, demonstrating its generalizability. Extensive experiments show that our ST-XCT-based NVC proposal achieves state-of-the-art compression performances on various standard video benchmark datasets.

引用

页码：8543 / 8551

页数：9

共 50 条

[31] STD-Net: Spatio-Temporal Decomposition Network for Video Demoiring With Sparse Transformers
Niu, Yuzhen
Xu, Rui
Lin, Zhihua
Liu, Wenxi
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (09) : 8562 - 8575
[32] Redesigning Embedding Layers for Queries, Keys, and Values in Cross-Covariance Image Transformers
Ahn, Jaesin
Hong, Jiuk
Ju, Jeongwoo
Jung, Heechul
MATHEMATICS, 2023, 11 (08)
[33] A Spatio-temporal Data Compression Algorithm
Wang, Lei
Guo, Yiming
Chen, Chen
Yan, Yaowei
2012 FOURTH INTERNATIONAL CONFERENCE ON MULTIMEDIA INFORMATION NETWORKING AND SECURITY (MINES 2012), 2012, : 421 - 424
[34] Video Fingerprint Algorithm Based on Spatio-Temporal Deep Neural Network
Wang Dongdong
Li Yuenan
LASER & OPTOELECTRONICS PROGRESS, 2018, 55 (01)
[35] Semiparametric spatio-temporal covariance models with the ARMA temporal margin
Chunsheng Ma
Annals of the Institute of Statistical Mathematics, 2005, 57 : 221 - 233
[36] Semiparametric spatio-temporal covariance models with the ARMA temporal margin
Ma, CS
ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 2005, 57 (02) : 221 - 233
[37] STFE-VC: Spatio-temporal feature enhancement for learned video compression
Wang, Yiming
Huang, Qian
Tang, Bin
Li, Xin
Li, Xing
EXPERT SYSTEMS WITH APPLICATIONS, 2025, 272
[38] High performance holographic video compression using spatio-temporal phase unwrapping
Gonzalez, Sorayda Trejos
Velez-Zea, Alejandro
Barrera-Ramirez, John Fredy
OPTICS AND LASERS IN ENGINEERING, 2024, 181
[39] End-to-End Learning of Video Compression Using Spatio-Temporal Autoencoders
Pessoa, Jorge
Aidos, Helena
Tomas, Pedro
Figueiredo, Mario A. T.
2020 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS), 2020, : 276 - 281
[40] Cross-scale hierarchical spatio-temporal transformer for video enhancement
Jiang, Qin
Wang, Qinglin
Chi, Lihua
Liu, Jie
KNOWLEDGE-BASED SYSTEMS, 2025, 309

← 1 2 3 4 5 →