Aggregating Global and Local Representations via Hybrid Transformer for Video Deraining

被引:1
|
作者
Mao, Deqian [1 ]
Gao, Shanshan [2 ]
Li, Zhenyu [1 ]
Dai, Honghao [1 ]
Zhang, Yunfeng [1 ,3 ]
Zhou, Yuanfeng
机构
[1] Shandong Univ Finance & Econ, Sch Comp Sci & Technol, Jinan 250014, Peoples R China
[2] Shandong Univ Finance & Econ, Sch Comp Sci & Technol, Shandong China US Digital Media Int Cooperat Res C, Key Lab Digital Media Technol Shandong Prov, Jinan 250014, Peoples R China
[3] Shandong Univ, Sch Software, Jinan 250101, Peoples R China
基金
中国国家自然科学基金;
关键词
Rain; Transformers; Feature extraction; Aggregates; Task analysis; Imaging; Image reconstruction; Video deraining; hybrid transformer; global and local representations; VDN-HT; REMOVAL; RAIN; LANGUAGE; VISION;
D O I
10.1109/TCSVT.2024.3372944
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Although video deraining technology has achieved great success in recent years, extracting spatiotemporal feature representations across the domains of spatial and temporal in successive frames, then performing spatial and temporal modeling, and restoring high-quality deraining videos with rich details are still challenging tasks. In this paper, we use the hybrid Transformer for the first attempt in video rain removal tasks, and propose a novel video deraining network based on hybrid transformer (VDN-HT) to aggregate global and local representations to accomplish video deraining. In the feature extraction process, we propose to use a U-shaped structure based on serial Transformer blocks to extract shallow local features, deep global features and global dependencies, and then adaptively aggregate them to obtain rainy video features with rain streaks of different directions and densities. In order to better model spatiotemporal relationships, the VDN-HT uses the Transformer's long-range and relational modeling abilities to obtain the features of spatial and the correlations of temporal between continuous video frames to achieve multi-frame alignment. For ensuring the global-local consistency of the reconstructed frames, we design a global-local reconstruction module composed of Transformer and convolutional neural network (CNN) in parallel to aggregate global and local information to better reconstruct each frame. In addition, the proposed gating-based refinement module and color loss effectively retain the details and color information after removing rain streaks. Extensive experiments on NTURain, RainSynLight25 and RainSynHeavy25 datasets have shown that the VDN-HT can handle many types of rainy videos and perform better than previous methods.
引用
收藏
页码:7512 / 7522
页数:11
相关论文
共 50 条
  • [41] A unified framework for unsupervised action learning via global-to-local motion transformer
    Kim, Boeun
    Kim, Jungho
    Chang, Hyung Jin
    Oh, Tae-Hyun
    PATTERN RECOGNITION, 2025, 159
  • [42] Exploring high-quality image deraining Transformer via effective large kernel attentionExploring high-quality image deraining Transformer via effective large kernel attentionH. Dong et al.
    Haobo Dong
    Tianyu Song
    Xuanyu Qi
    Jiyu Jin
    Guiyue Jin
    Lei Fan
    The Visual Computer, 2025, 41 (4) : 2545 - 2561
  • [43] Event-Aware Video Deraining via Multi-Patch Progressive Learning
    Sun, Shangquan
    Ren, Wenqi
    Li, Jingzhi
    Zhang, Kaihao
    Liang, Meiyu
    Cao, Xiaochun
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 3040 - 3053
  • [44] Local Frequency Domain Transformer Networks for Video Prediction
    Farazi, Hafez
    Nogga, Jan
    Behnke, Sven
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [45] Aggregating Global and Local Visual Representation for Vehicle Re-IDentification
    Lin, Xianming
    Li, Run
    Zheng, Xiawu
    Peng, Pai
    Wu, Yongjian
    Huang, Feiyue
    Ji, Rongrong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 3968 - 3977
  • [46] Video Summarization with Global and Local Features
    Guan, Genliang
    Wang, Zhiyong
    Yu, Kaimin
    Mei, Shaohui
    He, Mingyi
    Feng, Dagan
    2012 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (ICMEW), 2012, : 570 - 575
  • [47] Local and global methods in representations of Hecke algebras
    Du, Jie
    Parshall, Brian J.
    Scott, Leonard L.
    SCIENCE CHINA-MATHEMATICS, 2018, 61 (02) : 207 - 226
  • [48] Decoupling Local and Global Representations of Time Series
    Tonekaboni, Sana
    Li, Chun-Liang
    Arik, Sercan O.
    Goldenberg, Anna
    Pfister, Tomas
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
  • [49] Local and global methods in representations of Hecke algebras
    Jie Du
    Brian J.Parshall
    Leonard L.Scott
    ScienceChina(Mathematics), 2018, 61 (02) : 207 - 226
  • [50] Local conditions for global representations of quadratic forms
    Schulze-Pillot, Rainer
    ACTA ARITHMETICA, 2009, 138 (03) : 289 - 299