Multi-Stage Spatial and Frequency Feature Fusion using Transformer in CNN-Based In-Loop Filter for VVC

被引:4
|
作者
Kathariya, Birendra [1 ]
Li, Zhu [1 ]
Wang, Hongtao [2 ]
Coban, Mohammad [2 ]
机构
[1] Univ Missouri, Kansas City, MO 64110 USA
[2] Qualcomm Technol Inc, San Diego, CA USA
关键词
Versatile Video Coding (VVC); In-Loop Filter; Discrete Cosine Transform (DCT); Convolutional Neural Network; Transformer;
D O I
10.1109/PCS56426.2022.10017998
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Versatile Video Coding (VVC)/H.266 is a video coding successor to High Efficiency Video Coding (HEVC)/H.255 and Advanced Video Coding (AVC)/H.264 with significant technical and coding improvement. Nonetheless, it follows the conventional block-based hybrid video coding scheme similar to its predecessors. The consequence is, that the reconstructed picture contains compression artifacts. VVC, by default, has in-loop filters to correct the deformities but these handcrafted filters offer suboptimal performance. In this work, we designed a novel convolutional neural network (CNN) to replace the inbuilt in-loop filter of VVC. The proposed CNN-based in-loop filter utilizes a modified Spectral-wise Multi-Head Self-Attention (S-MSA) layer of Multi-stage Spectral-wise Transformer (MST++) at multiple stages to fuse spatial and frequency-decomposed features extracted from pixel and its discrete-cosine-transform (DCT) applied input respectively. We named the proposed network MSTFNet where the first three letters represent MST++ and F stands for fusion. Because of the multi-stage feature fusion operation, the proposed CNN acts as a powerful learned in-loop filter that significantly outperforms previous methods. Our experimental results show that the proposed method can achieve coding improvements up to 10.31% on average Bjontegaard Delta (BD)-Bitrate savings under all-intra (AI) configurations for the luma (Y) component.
引用
收藏
页码:373 / 377
页数:5
相关论文
共 50 条
  • [21] An Improved Multi-reference Frame Loop Filter Algorithm Based on Transformer for VVC
    Liu, Zhi
    Duan, Yunpeng
    Zhang, Mengmeng
    DCC 2022: 2022 DATA COMPRESSION CONFERENCE (DCC), 2022, : 467 - 467
  • [22] DeFFusion: CNN-based Continuous Authentication Using Deep Feature Fusion
    Li, Yantao
    Tao, Peng
    Deng, Shaojiang
    Zhou, Gang
    ACM TRANSACTIONS ON SENSOR NETWORKS, 2022, 18 (02)
  • [23] A Fast VVC Intra Prediction Based on Gradient Analysis and Multi-Feature Fusion CNN
    Jing, Zhiyong
    Zhu, Wendi
    Zhang, Qiuwen
    ELECTRONICS, 2023, 12 (09)
  • [24] Deep CNN-based hyperspectral image classification using discriminative multiple spatial-spectral feature fusion
    Guo, Hao
    Liu, Jianjun
    Xiao, Zhiyong
    Xiao, Liang
    REMOTE SENSING LETTERS, 2020, 11 (09) : 827 - 836
  • [25] Multi-stage remote sensing super-resolution network with deep fusion and structure enhancement based on CNN and transformer
    Liu, Jingyi
    Yang, Xiaomin
    SIGNAL IMAGE AND VIDEO PROCESSING, 2025, 19 (05)
  • [26] An In-loop Filter Based on Low-Complexity CNN Using Residuals in Intra Video Coding
    Li, Daowen
    Yu, Lu
    2019 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2019,
  • [27] Swin-Caption: Swin Transformer-Based Image Captioning with Feature Enhancement and Multi-Stage Fusion
    Liu, Lei
    Jiao, Yidi
    Li, Xiaoran
    Li, Jing
    Wang, Haitao
    Cao, Xinyu
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS, 2024,
  • [28] Time Delay Estimation for Sound Source Localization Using CNN-Based Multi-GCC Feature Fusion
    Liu, Haitao
    Zhang, Xiuliang
    Li, Penggao
    Yao, Yu
    Zhang, Sheng
    Xiao, Qian
    IEEE ACCESS, 2023, 11 : 140789 - 140800
  • [29] A multi-stage feature fusion defogging network based on the attention mechanism
    Song, Yuqin
    Zhao, Jitao
    Shang, Chunliang
    JOURNAL OF SUPERCOMPUTING, 2024, 80 (04): : 4577 - 4599
  • [30] A multi-stage feature fusion defogging network based on the attention mechanism
    Yuqin Song
    Jitao Zhao
    Chunliang Shang
    The Journal of Supercomputing, 2024, 80 (4) : 4577 - 4599