Multi-Stage Spatial and Frequency Feature Fusion using Transformer in CNN-Based In-Loop Filter for VVC

被引:4
|
作者
Kathariya, Birendra [1 ]
Li, Zhu [1 ]
Wang, Hongtao [2 ]
Coban, Mohammad [2 ]
机构
[1] Univ Missouri, Kansas City, MO 64110 USA
[2] Qualcomm Technol Inc, San Diego, CA USA
关键词
Versatile Video Coding (VVC); In-Loop Filter; Discrete Cosine Transform (DCT); Convolutional Neural Network; Transformer;
D O I
10.1109/PCS56426.2022.10017998
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Versatile Video Coding (VVC)/H.266 is a video coding successor to High Efficiency Video Coding (HEVC)/H.255 and Advanced Video Coding (AVC)/H.264 with significant technical and coding improvement. Nonetheless, it follows the conventional block-based hybrid video coding scheme similar to its predecessors. The consequence is, that the reconstructed picture contains compression artifacts. VVC, by default, has in-loop filters to correct the deformities but these handcrafted filters offer suboptimal performance. In this work, we designed a novel convolutional neural network (CNN) to replace the inbuilt in-loop filter of VVC. The proposed CNN-based in-loop filter utilizes a modified Spectral-wise Multi-Head Self-Attention (S-MSA) layer of Multi-stage Spectral-wise Transformer (MST++) at multiple stages to fuse spatial and frequency-decomposed features extracted from pixel and its discrete-cosine-transform (DCT) applied input respectively. We named the proposed network MSTFNet where the first three letters represent MST++ and F stands for fusion. Because of the multi-stage feature fusion operation, the proposed CNN acts as a powerful learned in-loop filter that significantly outperforms previous methods. Our experimental results show that the proposed method can achieve coding improvements up to 10.31% on average Bjontegaard Delta (BD)-Bitrate savings under all-intra (AI) configurations for the luma (Y) component.
引用
收藏
页码:373 / 377
页数:5
相关论文
共 50 条
  • [41] RGB-INFRARED MULTI-MODAL REMOTE SENSING OBJECT DETECTION USING CNN AND TRANSFORMER BASED FEATURE FUSION
    Tian, Tao
    Cai, Jiang
    Xu, Yang
    Wu, Zebin
    Wei, Zhihui
    Chanussot, Jocelyn
    IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 5728 - 5731
  • [42] CNN-Based Two-Stage Parking Slot Detection Using Region-Specific Multi-Scale Feature Extraction
    Bui, Quang Huy
    Suhr, Jae Kyu
    IEEE ACCESS, 2023, 11 : 58491 - 58505
  • [43] DLT-Embryo: A Dual-branch Local feature fusion enhanced Transformer for Embryo multi-stage classification
    Liu, Xiaojie
    Yu, Mengxin
    Liu, Haihui
    Ma, Chuanlong
    Du, Wenbin
    Wu, Haicui
    Zhang, Yuang
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2025, 102
  • [44] A High Throughput Multi-Stage, Multi-Frequency Filter and Separation Device Based on Carbon Dielectrophoresis
    Martinez-Duarte, R.
    Andrade-Roman, J.
    Martinez, S. O.
    Madou, M.
    NSTI NANOTECH 2008, VOL 3, TECHNICAL PROCEEDINGS: MICROSYSTEMS, PHOTONICS, SENSORS, FLUIDICS, MODELING, AND SIMULATION, 2008, : 316 - 319
  • [45] Three-Stream Head Pose Estimation Algorithm Based on Multi-Stage Feature Fusion
    Han, Xue
    Zhang, Hongying
    Lu, Xiuwen
    Zhang, Qi
    Computer Engineering and Applications, 2023, 59 (17) : 212 - 222
  • [46] MSCS: Multi-stage feature learning with channel-spatial attention mechanism for infrared and visible image fusion
    Huang, Zhenghua
    Xu, Biyun
    Xia, Menghan
    Li, Qian
    Zou, Lianying
    Li, Shaoyi
    Li, Xi
    INFRARED PHYSICS & TECHNOLOGY, 2024, 142
  • [47] Classifying LPI Radar Waveforms With Time-Frequency Transformations Using Multi-Stage CNN System
    Guven, Islam
    Yagmur, Can
    Karadas, Bahadir
    Parlak, Mehmet
    2022 23RD INTERNATIONAL RADAR SYMPOSIUM (IRS), 2022, : 501 - 506
  • [48] Multimodal Medical Image Fusion Using Hybrid Layer Decomposition With CNN-Based Feature Mapping and Structural Clustering
    Singh, Sneha
    Anand, R. S.
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2020, 69 (06) : 3855 - 3865
  • [49] CNN-Based Multilayer Spatial-Spectral Feature Fusion and Sample Augmentation With Local and Nonlocal Constraints for Hyperspectral Image Classification
    Feng, Jie
    Chen, Jiantong
    Liu, Liguo
    Cao, Xianghai
    Zhang, Xiangrong
    Jiao, Licheng
    Yu, Tao
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2019, 12 (04) : 1299 - 1313
  • [50] Multi-Level Feature Fusion in CNN-Based Human Action Recognition: A Case Study on EfficientNet-B7
    Lueangwitchajaroen, Pitiwat
    Watcharapinchai, Sitapa
    Tepsan, Worawit
    Sooksatra, Sorn
    JOURNAL OF IMAGING, 2024, 10 (12)