Multi-Stage Spatial and Frequency Feature Fusion using Transformer in CNN-Based In-Loop Filter for VVC

被引：4

作者：

Kathariya, Birendra ^{[1
]}

Li, Zhu ^{[1
]}

Wang, Hongtao ^{[2
]}

Coban, Mohammad ^{[2
]}

机构：

[1] Univ Missouri, Kansas City, MO 64110 USA

[2] Qualcomm Technol Inc, San Diego, CA USA

来源：

2022 PICTURE CODING SYMPOSIUM (PCS) | 2022年

关键词：

Versatile Video Coding (VVC); In-Loop Filter; Discrete Cosine Transform (DCT); Convolutional Neural Network; Transformer;

D O I：

10.1109/PCS56426.2022.10017998

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Versatile Video Coding (VVC)/H.266 is a video coding successor to High Efficiency Video Coding (HEVC)/H.255 and Advanced Video Coding (AVC)/H.264 with significant technical and coding improvement. Nonetheless, it follows the conventional block-based hybrid video coding scheme similar to its predecessors. The consequence is, that the reconstructed picture contains compression artifacts. VVC, by default, has in-loop filters to correct the deformities but these handcrafted filters offer suboptimal performance. In this work, we designed a novel convolutional neural network (CNN) to replace the inbuilt in-loop filter of VVC. The proposed CNN-based in-loop filter utilizes a modified Spectral-wise Multi-Head Self-Attention (S-MSA) layer of Multi-stage Spectral-wise Transformer (MST++) at multiple stages to fuse spatial and frequency-decomposed features extracted from pixel and its discrete-cosine-transform (DCT) applied input respectively. We named the proposed network MSTFNet where the first three letters represent MST++ and F stands for fusion. Because of the multi-stage feature fusion operation, the proposed CNN acts as a powerful learned in-loop filter that significantly outperforms previous methods. Our experimental results show that the proposed method can achieve coding improvements up to 10.31% on average Bjontegaard Delta (BD)-Bitrate savings under all-intra (AI) configurations for the luma (Y) component.

引用

页码：373 / 377

页数：5

共 50 条

[31] G-MS2F: GoogLeNet based multi-stage feature fusion of deep CNN for scene recognition
Tang, Pengjie
Wang, Hanli
Kwong, Sam
NEUROCOMPUTING, 2017, 225 : 188 - 197
[32] Optimized Input for CNN-Based Hyperspectral Image Classification Using Spatial Transformer Network
He, Xin
Chen, Yushi
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2019, 16 (12) : 1884 - 1888
[33] Multi-Object Tracking Algorithm Based on CNN-Transformer Feature Fusion
Zhang, Yingjun
Bai, Xiaohui
Xie, Binhong
Computer Engineering and Applications, 2024, 60 (02) : 180 - 190
[34] Fast Eye Tracking and Feature Measurement using a Multi-stage Particle Filter
Danescu, Radu
Darabant, Adrian Sergiu
Borza, Diana
PROCEEDINGS OF THE 12TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISIGRAPP 2017), VOL 5, 2017, : 258 - 265
[35] Improved AED with multi-stage feature extraction and fusion based on RFAConv and PSA
Wang, Bingbing
Wei, Yangjie
Wang, Zhuangzhuang
Qi, Zekang
SPEECH COMMUNICATION, 2025, 167
[36] CNN-Based Post-Processing Filter for Video Compression with Multi-Scale Feature Representation
Qi, Zhanyuan
Jung, Cheolkon
Liu, Yang
Li, Ming
2022 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2022,
[37] Feature Fusion for Human Activity Recognition using Parameter-Optimized Multi-Stage Graph Convolutional Network and Transformer Models
Belal, Mohammad
Hassan, Taimur
Ahmed, Abdelfatah
Aljarah, Ahmad
Alsheikh, Nael
Hussain, Irfan
2024 IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE, AVSS 2024, 2024,
[38] MSTRIQ: No Reference Image Quality Assessment Based on Swin Transformer with Multi-Stage Fusion
Wang, Jing
Fan, Haotian
Hou, Xiaoxia
Xu, Yitian
Li, Tao
Lu, Xuechao
Fu, Lean
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 1268 - 1277
[39] CNN-based driving maneuver classification using multi-sliding window fusion
Xie, Jie
Hu, Kai
Li, Guofa
Guo, Ya
EXPERT SYSTEMS WITH APPLICATIONS, 2021, 169
[40] Multi-stage Transfer Learning Based Yoga Pose Recognition Using CNN
Pradeep, Chakka Sai
Sinha, Neelam
PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PREMI 2021, 2024, 13102 : 151 - 159

← 1 2 3 4 5 →