Aggregating Global and Local Representations via Hybrid Transformer for Video Deraining

被引:1
|
作者
Mao, Deqian [1 ]
Gao, Shanshan [2 ]
Li, Zhenyu [1 ]
Dai, Honghao [1 ]
Zhang, Yunfeng [1 ,3 ]
Zhou, Yuanfeng
机构
[1] Shandong Univ Finance & Econ, Sch Comp Sci & Technol, Jinan 250014, Peoples R China
[2] Shandong Univ Finance & Econ, Sch Comp Sci & Technol, Shandong China US Digital Media Int Cooperat Res C, Key Lab Digital Media Technol Shandong Prov, Jinan 250014, Peoples R China
[3] Shandong Univ, Sch Software, Jinan 250101, Peoples R China
基金
中国国家自然科学基金;
关键词
Rain; Transformers; Feature extraction; Aggregates; Task analysis; Imaging; Image reconstruction; Video deraining; hybrid transformer; global and local representations; VDN-HT; REMOVAL; RAIN; LANGUAGE; VISION;
D O I
10.1109/TCSVT.2024.3372944
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Although video deraining technology has achieved great success in recent years, extracting spatiotemporal feature representations across the domains of spatial and temporal in successive frames, then performing spatial and temporal modeling, and restoring high-quality deraining videos with rich details are still challenging tasks. In this paper, we use the hybrid Transformer for the first attempt in video rain removal tasks, and propose a novel video deraining network based on hybrid transformer (VDN-HT) to aggregate global and local representations to accomplish video deraining. In the feature extraction process, we propose to use a U-shaped structure based on serial Transformer blocks to extract shallow local features, deep global features and global dependencies, and then adaptively aggregate them to obtain rainy video features with rain streaks of different directions and densities. In order to better model spatiotemporal relationships, the VDN-HT uses the Transformer's long-range and relational modeling abilities to obtain the features of spatial and the correlations of temporal between continuous video frames to achieve multi-frame alignment. For ensuring the global-local consistency of the reconstructed frames, we design a global-local reconstruction module composed of Transformer and convolutional neural network (CNN) in parallel to aggregate global and local information to better reconstruct each frame. In addition, the proposed gating-based refinement module and color loss effectively retain the details and color information after removing rain streaks. Extensive experiments on NTURain, RainSynLight25 and RainSynHeavy25 datasets have shown that the VDN-HT can handle many types of rainy videos and perform better than previous methods.
引用
收藏
页码:7512 / 7522
页数:11
相关论文
共 50 条
  • [31] Video Desnowing and Deraining via Saliency and Dual Adaptive Spatiotemporal Filtering
    Li, Yongji
    Wu, Rui
    Jia, Zhenhong
    Yang, Jie
    Kasabov, Nikola
    SENSORS, 2021, 21 (22)
  • [32] TransMatch: Transformer-based correspondence pruning via local and global consensus
    Liu, Yizhang
    Li, Yanping
    Zhao, Shengjie
    PATTERN RECOGNITION, 2025, 159
  • [33] LEARN A ROBUST REPRESENTATION FOR COVER SONG IDENTIFICATION VIA AGGREGATING LOCAL AND GLOBAL MUSIC TEMPORAL CONTEXT
    Jiang, Chaoya
    Yang, Deshun
    Chen, Xiaoou
    2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2020,
  • [34] Temporal Multimodal Graph Transformer With Global-Local Alignment for Video-Text Retrieval
    Feng, Zerun
    Zeng, Zhimin
    Guo, Caili
    Li, Zheng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (03) : 1438 - 1453
  • [35] FVIFormer: Flow-Guided Global-Local Aggregation Transformer Network for Video Inpainting
    Yan, Weiqing
    Sun, Yiqiu
    Yue, Guanghui
    Zhou, Wei
    Liu, Hantao
    IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2024, 14 (02) : 235 - 244
  • [36] Aggregating multi-scale flow-enhanced information in transformer for video inpainting
    Li, Guanxiao
    Zhang, Ke
    Su, Yu
    Wang, Jingyu
    MULTIMEDIA SYSTEMS, 2025, 31 (01)
  • [37] ANALYTICAL REPRESENTATIONS OF GLOBAL AND LOCAL SUPERSYMMETRY
    LUKIERSKI, J
    NOWOTNIK, M
    PHYSICS LETTERS B, 1983, 125 (06) : 452 - 456
  • [38] IRREDUCIBILITY CRITERIA FOR LOCAL AND GLOBAL REPRESENTATIONS
    Narita, Hiro-Aki
    Pitale, Ameya
    Schmidt, Ralf
    PROCEEDINGS OF THE AMERICAN MATHEMATICAL SOCIETY, 2013, 141 (01) : 55 - 63
  • [39] Contrastive Video Question Answering via Video Graph Transformer
    Xiao, Junbin
    Zhou, Pan
    Yao, Angela
    Li, Yicong
    Hong, Richang
    Yan, Shuicheng
    Chua, Tat-Seng
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (11) : 13265 - 13280
  • [40] Exploring high-quality image deraining Transformer via effective large kernel attention
    Dong, Haobo
    Song, Tianyu
    Qi, Xuanyu
    Jin, Jiyu
    Jin, Guiyue
    Fan, Lei
    VISUAL COMPUTER, 2025, 41 (04): : 2545 - 2561