Recurrent Video Restoration Transformer with Guided Deformable Attention

被引:0
|
作者
Liang, Jingyun [1 ]
Fan, Yuchen [2 ]
Xiang, Xiaoyu [2 ]
Ranjan, Rakesh [2 ]
Ilg, Eddy [2 ]
Green, Simon [2 ]
Cao, Jiezhang [1 ]
Zhang, Kai [1 ]
Timofte, Radu [1 ,3 ]
Van Gool, Luc [1 ]
机构
[1] Swiss Fed Inst Technol, Comp Vis Lab, Zurich, Switzerland
[2] Meta Inc, Menlo Pk, CA USA
[3] Univ Wurzburg, Wurzburg, Germany
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video restoration aims at restoring multiple high-quality frames from multiple low-quality frames. Existing video restoration methods generally fall into two extreme cases, i.e., they either restore all frames in parallel or restore the video frame by frame in a recurrent way, which would result in different merits and drawbacks. Typically, the former has the advantage of temporal information fusion. However, it suffers from large model size and intensive memory consumption; the latter has a relatively small model size as it shares parameters across frames; however, it lacks long-range dependency modeling ability and parallelizability. In this paper, we attempt to integrate the advantages of the two cases by proposing a recurrent video restoration transformer, namely RVRT. RVRT processes local neighboring frames in parallel within a globally recurrent framework which can achieve a good trade-off between model size, effectiveness, and efficiency. Specifically, RVRT divides the video into multiple clips and uses the previously inferred clip feature to estimate the subsequent clip feature. Within each clip, different frame features are jointly updated with implicit feature aggregation. Across different clips, the guided deformable attention is designed for clip-to-clip alignment, which predicts multiple relevant locations from the whole inferred clip and aggregates their features by the attention mechanism. Extensive experiments on video super-resolution, deblurring, and denoising show that the proposed RVRT achieves state-of-the-art performance on benchmark datasets with balanced model size, testing memory and runtime. The codes are available at https://github.com/JingyunLiang/RVRT.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Deformable Video Transformer
    Wang, Jue
    Torresani, Lorenzo
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 14033 - 14042
  • [2] Vision Transformer with Deformable Attention
    Xia, Zhuofan
    Pan, Xuran
    Song, Shiji
    Li, Li Erran
    Huang, Gao
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 4784 - 4793
  • [3] Video Sparse Transformer With Attention-Guided Memory for Video Object Detection
    Fujitake, Masato
    Sugimoto, Akihiro
    IEEE ACCESS, 2022, 10 : 65886 - 65900
  • [4] Attention-guided video super-resolution with recurrent multi-scale spatial–temporal transformer
    Wei Sun
    Xianguang Kong
    Yanning Zhang
    Complex & Intelligent Systems, 2023, 9 : 3989 - 4002
  • [5] Attention-guided video super-resolution with recurrent multi-scale spatial-temporal transformer
    Sun, Wei
    Kong, Xianguang
    Zhang, Yanning
    COMPLEX & INTELLIGENT SYSTEMS, 2023, 9 (04) : 3989 - 4002
  • [6] VRT: A Video Restoration Transformer
    Liang, Jingyun
    Cao, Jiezhang
    Fan, Yuchen
    Zhang, Kai
    Ranjan, Rakesh
    Li, Yawei
    Timofte, Radu
    Van Gool, Luc
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 2171 - 2182
  • [7] FLOW-GUIDED DEFORMABLE ATTENTION NETWORK FOR FAST ONLINE VIDEO SUPER-RESOLUTION
    Yang, Xi
    Zhang, Xindong
    Zhang, Lei
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 390 - 394
  • [8] Object Detection in Drone Video with Temporal Attention Gated Recurrent Unit Based on Transformer
    Zhou, Zihao
    Yu, Xianguo
    Chen, Xiangcheng
    DRONES, 2023, 7 (07)
  • [9] Identification of Fish Hunger Degree with Deformable Attention Transformer
    Wu, Yuqiang
    Xu, Huanliang
    Wu, Xuehui
    Wang, Haiqing
    Zhai, Zhaoyu
    JOURNAL OF MARINE SCIENCE AND ENGINEERING, 2024, 12 (05)
  • [10] DEFORMABLE VISTR: SPATIO TEMPORAL DEFORMABLE ATTENTION FOR VIDEO INSTANCE SEGMENTATION
    Yarram, Sudhir
    Wu, Jialian
    Ji, Pan
    Xu, Yi
    Yuan, Junsong
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 3303 - 3307