Cross-scale hierarchical spatio-temporal transformer for video enhancement

被引:0
|
作者
Jiang, Qin [1 ,2 ,3 ]
Wang, Qinglin [1 ,2 ,3 ]
Chi, Lihua [4 ]
Liu, Jie [1 ,2 ,3 ]
机构
[1] Natl Univ Def Technol, Changsha, Peoples R China
[2] Lab Digitizing Software Frontier Equipment, Changsha, Peoples R China
[3] Sci & Technol Parallel & Distributed Proc Lab, Changsha, Peoples R China
[4] Hunan GuoKe Computil Technol Co Ltd, Changsha, Peoples R China
关键词
Video super-resolution; Denoising; Deblurring; Transformer; Temporal;
D O I
10.1016/j.knosys.2024.112773
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The diversity and complexity of degradations in low-quality videos pose non-trivial challenges on video enhancement to reconstruct the high-quality counterparts. Prevailing sliding window based methods represent poor performance due to the limitation of window size. Recurrent networks take advantage of long-term modeling to aggregate more information, resulting insignificant performance improvements. However, most of them are trained on simple degraded data and can only tackle specific degradation. To break through the limitation, we propose a progressive alignment network, namely Cross-scale Hierarchical Spatio-Temporal Transformer (CHSTT), which leverages cross-scale tokenization to generate multi-scale visual tokens in the entire sequence to capture extensive long-range temporal dependencies. To enhance the spatial and temporal interactions, we introduce an innovative hierarchical Transformer, facilitating the computation of mutual multi-head attention across both spatial and temporal dimensions. Quantitative and qualitative assessments substantiate the superior performance of CHSTT compared to several state-of-the-art benchmarks across three distinct video enhancement tasks, including video super-resolution, video denoising, and video deblurring.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Spatio-temporal video contrast enhancement
    Celik, Turgay
    IET IMAGE PROCESSING, 2013, 7 (06) : 543 - 555
  • [2] Spatio-Temporal Transformer Network for Video Restoration
    Kim, Tae Hyun
    Sajjadi, Mehdi S. M.
    Hirsch, Michael
    Schoelkopf, Bernhard
    COMPUTER VISION - ECCV 2018, PT III, 2018, 11207 : 111 - 127
  • [3] MSTG: Multi-Scale Transformer with Gradient for joint spatio-temporal enhancement
    Lin, Xin
    Chen, Junli
    Ai, Shaojie
    Liu, Jing
    Li, Bochao
    Li, Qingying
    Ma, Rui
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 102
  • [4] Spatio-Temporal Consistency in Depth Video Enhancement
    Li, Li
    Zhang, Caiming
    JOURNAL OF ADVANCED MECHANICAL DESIGN SYSTEMS AND MANUFACTURING, 2013, 7 (05): : 808 - 817
  • [5] Transformer with Spatio-Temporal Representation for Video Anomaly Detection
    Sun, Xiaohu
    Chen, Jinyi
    Shen, Xulin
    Li, Hongjun
    STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, S+SSPR 2022, 2022, 13813 : 213 - 222
  • [6] Spatio-Temporal Inference Transformer Network for Video Inpainting
    Tudavekar, Gajanan
    Saraf, Santosh S.
    Patil, Sanjay R.
    INTERNATIONAL JOURNAL OF IMAGE AND GRAPHICS, 2023, 23 (01)
  • [7] Spatio-Temporal Scale Selection in Video Data
    Tony Lindeberg
    Journal of Mathematical Imaging and Vision, 2018, 60 : 525 - 562
  • [8] Spatio-Temporal Scale Selection in Video Data
    Lindeberg, Tony
    JOURNAL OF MATHEMATICAL IMAGING AND VISION, 2018, 60 (04) : 525 - 562
  • [9] Spatio-Temporal Scale Selection in Video Data
    Lindeberg, Tony
    SCALE SPACE AND VARIATIONAL METHODS IN COMPUTER VISION, SSVM 2017, 2017, 10302 : 3 - 15
  • [10] STAR++: Rethinking spatio-temporal cross attention transformer for video action recognition
    Dasom Ahn
    Sangwon Kim
    Byoung Chul Ko
    Applied Intelligence, 2023, 53 : 28446 - 28459