Cross-scale hierarchical spatio-temporal transformer for video enhancement

被引:0
|
作者
Jiang, Qin [1 ,2 ,3 ]
Wang, Qinglin [1 ,2 ,3 ]
Chi, Lihua [4 ]
Liu, Jie [1 ,2 ,3 ]
机构
[1] Natl Univ Def Technol, Changsha, Peoples R China
[2] Lab Digitizing Software Frontier Equipment, Changsha, Peoples R China
[3] Sci & Technol Parallel & Distributed Proc Lab, Changsha, Peoples R China
[4] Hunan GuoKe Computil Technol Co Ltd, Changsha, Peoples R China
关键词
Video super-resolution; Denoising; Deblurring; Transformer; Temporal;
D O I
10.1016/j.knosys.2024.112773
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The diversity and complexity of degradations in low-quality videos pose non-trivial challenges on video enhancement to reconstruct the high-quality counterparts. Prevailing sliding window based methods represent poor performance due to the limitation of window size. Recurrent networks take advantage of long-term modeling to aggregate more information, resulting insignificant performance improvements. However, most of them are trained on simple degraded data and can only tackle specific degradation. To break through the limitation, we propose a progressive alignment network, namely Cross-scale Hierarchical Spatio-Temporal Transformer (CHSTT), which leverages cross-scale tokenization to generate multi-scale visual tokens in the entire sequence to capture extensive long-range temporal dependencies. To enhance the spatial and temporal interactions, we introduce an innovative hierarchical Transformer, facilitating the computation of mutual multi-head attention across both spatial and temporal dimensions. Quantitative and qualitative assessments substantiate the superior performance of CHSTT compared to several state-of-the-art benchmarks across three distinct video enhancement tasks, including video super-resolution, video denoising, and video deblurring.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Neural Video Compression with Spatio-Temporal Cross-Covariance Transformers
    Chen, Zhenghao
    Relic, Lucas
    Azevedo, Roberto
    Zhang, Yang
    Gross, Markus
    Xu, Dong
    Zhou, Luping
    Schroers, Christopher
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 8543 - 8551
  • [42] Video Segmentation with Spatio-Temporal Tubes
    Trichet, Remi
    Nevatia, Ramakant
    2013 10TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE (AVSS 2013), 2013, : 330 - 335
  • [43] Spatio-temporal segmentation for video surveillance
    Sun, HZ
    Tan, TN
    ELECTRONICS LETTERS, 2001, 37 (01) : 20 - 21
  • [44] Hierarchical Transformer with Spatio-temporal Context Aggregation for Next Point-of-interest Recommendation
    Xie, Jiayi
    Chen, Zhenzhong
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2024, 42 (02)
  • [45] Spatio-temporal segmentation for video surveillance
    Sun, HZ
    Feng, T
    Tan, TN
    15TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 1, PROCEEDINGS: COMPUTER VISION AND IMAGE ANALYSIS, 2000, : 843 - 846
  • [46] VideoZoom Spatio-Temporal Video Browser
    Smith, John R.
    IEEE TRANSACTIONS ON MULTIMEDIA, 1999, 1 (02) : 157 - 171
  • [47] Spatio-Temporal Perturbations for Video Attribution
    Li, Zhenqiang
    Wang, Weimin
    Li, Zuoyue
    Huang, Yifei
    Sato, Yoichi
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (04) : 2043 - 2056
  • [48] Spatio-temporal querying in video databases
    Köprülü, M
    Çiçekli, NK
    Yazici, A
    FLEXIBLE QUERY ANSWERING SYSTEMS, PROCEEDINGS, 2002, 2522 : 251 - 262
  • [49] Spatio-temporal querying in video databases
    Koprulu, M
    Cicekli, NK
    Yazici, A
    INFORMATION SCIENCES, 2004, 160 (1-4) : 131 - 152
  • [50] Video anomaly detection based on multi-scale optical flow spatio-temporal enhancement and normality mining
    He, Qiang
    Shi, Ruinian
    Chen, Linlin
    Huo, Lianzhi
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2025, 16 (03) : 1873 - 1888