Multi-Stage Spatial and Frequency Feature Fusion using Transformer in CNN-Based In-Loop Filter for VVC

被引:4
|
作者
Kathariya, Birendra [1 ]
Li, Zhu [1 ]
Wang, Hongtao [2 ]
Coban, Mohammad [2 ]
机构
[1] Univ Missouri, Kansas City, MO 64110 USA
[2] Qualcomm Technol Inc, San Diego, CA USA
关键词
Versatile Video Coding (VVC); In-Loop Filter; Discrete Cosine Transform (DCT); Convolutional Neural Network; Transformer;
D O I
10.1109/PCS56426.2022.10017998
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Versatile Video Coding (VVC)/H.266 is a video coding successor to High Efficiency Video Coding (HEVC)/H.255 and Advanced Video Coding (AVC)/H.264 with significant technical and coding improvement. Nonetheless, it follows the conventional block-based hybrid video coding scheme similar to its predecessors. The consequence is, that the reconstructed picture contains compression artifacts. VVC, by default, has in-loop filters to correct the deformities but these handcrafted filters offer suboptimal performance. In this work, we designed a novel convolutional neural network (CNN) to replace the inbuilt in-loop filter of VVC. The proposed CNN-based in-loop filter utilizes a modified Spectral-wise Multi-Head Self-Attention (S-MSA) layer of Multi-stage Spectral-wise Transformer (MST++) at multiple stages to fuse spatial and frequency-decomposed features extracted from pixel and its discrete-cosine-transform (DCT) applied input respectively. We named the proposed network MSTFNet where the first three letters represent MST++ and F stands for fusion. Because of the multi-stage feature fusion operation, the proposed CNN acts as a powerful learned in-loop filter that significantly outperforms previous methods. Our experimental results show that the proposed method can achieve coding improvements up to 10.31% on average Bjontegaard Delta (BD)-Bitrate savings under all-intra (AI) configurations for the luma (Y) component.
引用
收藏
页码:373 / 377
页数:5
相关论文
共 50 条
  • [31] G-MS2F: GoogLeNet based multi-stage feature fusion of deep CNN for scene recognition
    Tang, Pengjie
    Wang, Hanli
    Kwong, Sam
    NEUROCOMPUTING, 2017, 225 : 188 - 197
  • [32] Optimized Input for CNN-Based Hyperspectral Image Classification Using Spatial Transformer Network
    He, Xin
    Chen, Yushi
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2019, 16 (12) : 1884 - 1888
  • [33] Multi-Object Tracking Algorithm Based on CNN-Transformer Feature Fusion
    Zhang, Yingjun
    Bai, Xiaohui
    Xie, Binhong
    Computer Engineering and Applications, 2024, 60 (02) : 180 - 190
  • [34] Fast Eye Tracking and Feature Measurement using a Multi-stage Particle Filter
    Danescu, Radu
    Darabant, Adrian Sergiu
    Borza, Diana
    PROCEEDINGS OF THE 12TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISIGRAPP 2017), VOL 5, 2017, : 258 - 265
  • [35] Improved AED with multi-stage feature extraction and fusion based on RFAConv and PSA
    Wang, Bingbing
    Wei, Yangjie
    Wang, Zhuangzhuang
    Qi, Zekang
    SPEECH COMMUNICATION, 2025, 167
  • [36] CNN-Based Post-Processing Filter for Video Compression with Multi-Scale Feature Representation
    Qi, Zhanyuan
    Jung, Cheolkon
    Liu, Yang
    Li, Ming
    2022 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2022,
  • [37] Feature Fusion for Human Activity Recognition using Parameter-Optimized Multi-Stage Graph Convolutional Network and Transformer Models
    Belal, Mohammad
    Hassan, Taimur
    Ahmed, Abdelfatah
    Aljarah, Ahmad
    Alsheikh, Nael
    Hussain, Irfan
    2024 IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE, AVSS 2024, 2024,
  • [38] MSTRIQ: No Reference Image Quality Assessment Based on Swin Transformer with Multi-Stage Fusion
    Wang, Jing
    Fan, Haotian
    Hou, Xiaoxia
    Xu, Yitian
    Li, Tao
    Lu, Xuechao
    Fu, Lean
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 1268 - 1277
  • [39] CNN-based driving maneuver classification using multi-sliding window fusion
    Xie, Jie
    Hu, Kai
    Li, Guofa
    Guo, Ya
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 169
  • [40] Multi-stage Transfer Learning Based Yoga Pose Recognition Using CNN
    Pradeep, Chakka Sai
    Sinha, Neelam
    PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PREMI 2021, 2024, 13102 : 151 - 159