Multi-Stage Spatial and Frequency Feature Fusion using Transformer in CNN-Based In-Loop Filter for VVC

被引：4

作者：

Kathariya, Birendra ^{[1
]}

Li, Zhu ^{[1
]}

Wang, Hongtao ^{[2
]}

Coban, Mohammad ^{[2
]}

机构：

[1] Univ Missouri, Kansas City, MO 64110 USA

[2] Qualcomm Technol Inc, San Diego, CA USA

来源：

2022 PICTURE CODING SYMPOSIUM (PCS) | 2022年

关键词：

Versatile Video Coding (VVC); In-Loop Filter; Discrete Cosine Transform (DCT); Convolutional Neural Network; Transformer;

D O I：

10.1109/PCS56426.2022.10017998

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Versatile Video Coding (VVC)/H.266 is a video coding successor to High Efficiency Video Coding (HEVC)/H.255 and Advanced Video Coding (AVC)/H.264 with significant technical and coding improvement. Nonetheless, it follows the conventional block-based hybrid video coding scheme similar to its predecessors. The consequence is, that the reconstructed picture contains compression artifacts. VVC, by default, has in-loop filters to correct the deformities but these handcrafted filters offer suboptimal performance. In this work, we designed a novel convolutional neural network (CNN) to replace the inbuilt in-loop filter of VVC. The proposed CNN-based in-loop filter utilizes a modified Spectral-wise Multi-Head Self-Attention (S-MSA) layer of Multi-stage Spectral-wise Transformer (MST++) at multiple stages to fuse spatial and frequency-decomposed features extracted from pixel and its discrete-cosine-transform (DCT) applied input respectively. We named the proposed network MSTFNet where the first three letters represent MST++ and F stands for fusion. Because of the multi-stage feature fusion operation, the proposed CNN acts as a powerful learned in-loop filter that significantly outperforms previous methods. Our experimental results show that the proposed method can achieve coding improvements up to 10.31% on average Bjontegaard Delta (BD)-Bitrate savings under all-intra (AI) configurations for the luma (Y) component.

引用

页码：373 / 377

页数：5

共 50 条

[21] An Improved Multi-reference Frame Loop Filter Algorithm Based on Transformer for VVC
Liu, Zhi
Duan, Yunpeng
Zhang, Mengmeng
DCC 2022: 2022 DATA COMPRESSION CONFERENCE (DCC), 2022, : 467 - 467
[22] DeFFusion: CNN-based Continuous Authentication Using Deep Feature Fusion
Li, Yantao
Tao, Peng
Deng, Shaojiang
Zhou, Gang
ACM TRANSACTIONS ON SENSOR NETWORKS, 2022, 18 (02)
[23] A Fast VVC Intra Prediction Based on Gradient Analysis and Multi-Feature Fusion CNN
Jing, Zhiyong
Zhu, Wendi
Zhang, Qiuwen
ELECTRONICS, 2023, 12 (09)
[24] Deep CNN-based hyperspectral image classification using discriminative multiple spatial-spectral feature fusion
Guo, Hao
Liu, Jianjun
Xiao, Zhiyong
Xiao, Liang
REMOTE SENSING LETTERS, 2020, 11 (09) : 827 - 836
[25] Multi-stage remote sensing super-resolution network with deep fusion and structure enhancement based on CNN and transformer
Liu, Jingyi
Yang, Xiaomin
SIGNAL IMAGE AND VIDEO PROCESSING, 2025, 19 (05)
[26] An In-loop Filter Based on Low-Complexity CNN Using Residuals in Intra Video Coding
Li, Daowen
Yu, Lu
2019 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2019,
[27] Swin-Caption: Swin Transformer-Based Image Captioning with Feature Enhancement and Multi-Stage Fusion
Liu, Lei
Jiao, Yidi
Li, Xiaoran
Li, Jing
Wang, Haitao
Cao, Xinyu
INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS, 2024,
[28] Time Delay Estimation for Sound Source Localization Using CNN-Based Multi-GCC Feature Fusion
Liu, Haitao
Zhang, Xiuliang
Li, Penggao
Yao, Yu
Zhang, Sheng
Xiao, Qian
IEEE ACCESS, 2023, 11 : 140789 - 140800
[29] A multi-stage feature fusion defogging network based on the attention mechanism
Song, Yuqin
Zhao, Jitao
Shang, Chunliang
JOURNAL OF SUPERCOMPUTING, 2024, 80 (04): : 4577 - 4599
[30] A multi-stage feature fusion defogging network based on the attention mechanism
Yuqin Song
Jitao Zhao
Chunliang Shang
The Journal of Supercomputing, 2024, 80 (4) : 4577 - 4599

← 1 2 3 4 5 →