Cross-scale hierarchical spatio-temporal transformer for video enhancement

被引:0
|
作者
Jiang, Qin [1 ,2 ,3 ]
Wang, Qinglin [1 ,2 ,3 ]
Chi, Lihua [4 ]
Liu, Jie [1 ,2 ,3 ]
机构
[1] Natl Univ Def Technol, Changsha, Peoples R China
[2] Lab Digitizing Software Frontier Equipment, Changsha, Peoples R China
[3] Sci & Technol Parallel & Distributed Proc Lab, Changsha, Peoples R China
[4] Hunan GuoKe Computil Technol Co Ltd, Changsha, Peoples R China
关键词
Video super-resolution; Denoising; Deblurring; Transformer; Temporal;
D O I
10.1016/j.knosys.2024.112773
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The diversity and complexity of degradations in low-quality videos pose non-trivial challenges on video enhancement to reconstruct the high-quality counterparts. Prevailing sliding window based methods represent poor performance due to the limitation of window size. Recurrent networks take advantage of long-term modeling to aggregate more information, resulting insignificant performance improvements. However, most of them are trained on simple degraded data and can only tackle specific degradation. To break through the limitation, we propose a progressive alignment network, namely Cross-scale Hierarchical Spatio-Temporal Transformer (CHSTT), which leverages cross-scale tokenization to generate multi-scale visual tokens in the entire sequence to capture extensive long-range temporal dependencies. To enhance the spatial and temporal interactions, we introduce an innovative hierarchical Transformer, facilitating the computation of mutual multi-head attention across both spatial and temporal dimensions. Quantitative and qualitative assessments substantiate the superior performance of CHSTT compared to several state-of-the-art benchmarks across three distinct video enhancement tasks, including video super-resolution, video denoising, and video deblurring.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Spatio-Temporal Deformable Convolution for Compressed Video Quality Enhancement
    Deng, Jianing
    Wang, Li
    Pu, Shiliang
    Zhuo, Cheng
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 10696 - 10703
  • [22] Associative Memory With Spatio-Temporal Enhancement for Video Anomaly Detection
    Zhong, Yuanhong
    Hu, Yongting
    Tang, Panliang
    Wang, Heng
    IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 1212 - 1216
  • [23] Blur-aware Spatio-temporal Sparse Transformer for Video Deblurring
    Zhang, Huicong
    Xie, Haozhe
    Yao, Hongxun
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 2673 - 2681
  • [24] Point Spatio-Temporal Transformer Networks for Point Cloud Video Modeling
    Fan, Hehe
    Yang, Yi
    Kankanhalli, Mohan
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (02) : 2181 - 2192
  • [25] ACSF-ED: Adaptive Cross-Scale Fusion Encoder-Decoder for Spatio-Temporal Action Detection
    Wang, Wenju
    Gu, Zehua
    Tang, Bang
    Wang, Sen
    Hao, Jianfei
    CMC-COMPUTERS MATERIALS & CONTINUA, 2025, 82 (02): : 2389 - 2414
  • [26] Spatio-temporal Sampling for Video
    Shankar, Mohan
    Pitsiauis, Nikos P.
    Brady, David
    IMAGE RECONSTRUCTION FROM INCOMPLETE DATA V, 2008, 7076
  • [27] Video Question Answering via Hierarchical Spatio-Temporal Attention Networks
    Zhao, Zhou
    Yang, Qifan
    Cai, Deng
    He, Xiaofei
    Zhuang, Yueting
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 3518 - 3524
  • [28] Mixformer: Mixture transformer with hierarchical context for spatio-temporal wind speed forecasting
    Wu, Tangjie
    Ling, Qiang
    ENERGY CONVERSION AND MANAGEMENT, 2024, 299
  • [29] STAR-Transformer: A Spatio-temporal Cross Attention Transformer for Human Action Recognition
    Ahn, Dasom
    Kim, Sangwon
    Hong, Hyunsu
    Ko, Byoung Chul
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 3319 - 3328
  • [30] Dual-frame spatio-temporal feature modulation for video enhancement
    Patil, Prashant W.
    Gupta, Sunil
    Rana, Santu
    Venkatesh, Svetha
    PATTERN RECOGNITION, 2022, 130