Aggregating Global and Local Representations via Hybrid Transformer for Video Deraining

被引:1
|
作者
Mao, Deqian [1 ]
Gao, Shanshan [2 ]
Li, Zhenyu [1 ]
Dai, Honghao [1 ]
Zhang, Yunfeng [1 ,3 ]
Zhou, Yuanfeng
机构
[1] Shandong Univ Finance & Econ, Sch Comp Sci & Technol, Jinan 250014, Peoples R China
[2] Shandong Univ Finance & Econ, Sch Comp Sci & Technol, Shandong China US Digital Media Int Cooperat Res C, Key Lab Digital Media Technol Shandong Prov, Jinan 250014, Peoples R China
[3] Shandong Univ, Sch Software, Jinan 250101, Peoples R China
基金
中国国家自然科学基金;
关键词
Rain; Transformers; Feature extraction; Aggregates; Task analysis; Imaging; Image reconstruction; Video deraining; hybrid transformer; global and local representations; VDN-HT; REMOVAL; RAIN; LANGUAGE; VISION;
D O I
10.1109/TCSVT.2024.3372944
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Although video deraining technology has achieved great success in recent years, extracting spatiotemporal feature representations across the domains of spatial and temporal in successive frames, then performing spatial and temporal modeling, and restoring high-quality deraining videos with rich details are still challenging tasks. In this paper, we use the hybrid Transformer for the first attempt in video rain removal tasks, and propose a novel video deraining network based on hybrid transformer (VDN-HT) to aggregate global and local representations to accomplish video deraining. In the feature extraction process, we propose to use a U-shaped structure based on serial Transformer blocks to extract shallow local features, deep global features and global dependencies, and then adaptively aggregate them to obtain rainy video features with rain streaks of different directions and densities. In order to better model spatiotemporal relationships, the VDN-HT uses the Transformer's long-range and relational modeling abilities to obtain the features of spatial and the correlations of temporal between continuous video frames to achieve multi-frame alignment. For ensuring the global-local consistency of the reconstructed frames, we design a global-local reconstruction module composed of Transformer and convolutional neural network (CNN) in parallel to aggregate global and local information to better reconstruct each frame. In addition, the proposed gating-based refinement module and color loss effectively retain the details and color information after removing rain streaks. Extensive experiments on NTURain, RainSynLight25 and RainSynHeavy25 datasets have shown that the VDN-HT can handle many types of rainy videos and perform better than previous methods.
引用
收藏
页码:7512 / 7522
页数:11
相关论文
共 50 条
  • [22] Geoacoustic inversion via local, global, and hybrid algorithms
    Fallat, MR
    Dosso, SE
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1999, 105 (06): : 3219 - 3230
  • [23] Local-Global Context Aware Transformer for Language-Guided Video Segmentation
    Liang, Chen
    Wang, Wenguan
    Zhou, Tianfei
    Miao, Jiaxu
    Luo, Yawei
    Yang, Yi
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (08) : 10055 - 10069
  • [24] LOCAL TO GLOBAL TRANSFORMER FOR VIDEO BASED 3D HUMAN POSE ESTIMATION
    Ma, Haifeng
    Ke Lu
    Xue, Jian
    Niu, Zehai
    Gao, Pengcheng
    2022 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (IEEE ICMEW 2022), 2022,
  • [25] Local flow propagation and global multi-scale dilated Transformer for video inpainting
    Zuo, Yuting
    Chen, Jing
    Wang, Kaixing
    Lin, Qi
    Zeng, Huanqiang
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2025, 107
  • [26] ConvTransNet: A CNN-Transformer Network for Change Detection With Multiscale Global-Local Representations
    Li, Weiming
    Xue, Lihui
    Wang, Xueqian
    Li, Gang
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [27] Hybrid Local-Global Context Learning for Neural Video Compression
    Zhai, Yongqi
    Yang, Jiayu
    Jiang, Wei
    Yang, Chunhui
    Tang, Luyang
    Wang, Ronggang
    2024 DATA COMPRESSION CONFERENCE, DCC, 2024, : 322 - 331
  • [28] FaceFormer: Aggregating Global and Local Representation for Face Hallucination
    Wang, Yuanzhi
    Lu, Tao
    Zhang, Yanduo
    Wang, Zhongyuan
    Jiang, Junjun
    Xiong, Zixiang
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (06) : 2533 - 2545
  • [29] Aggregating Local and Global Text Features for Linguistic Steganalysis
    Xiang, Lingyun
    Liu, Yuhang
    You, Huiqing
    Ou, Chengfu
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 1502 - 1506
  • [30] Aggregating partial, local evaluations to achieve global ranking
    Laureti, P
    Moret, L
    Zhang, YC
    PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2005, 345 (3-4) : 705 - 712