DSFormer: Leveraging Transformer with Cross-Modal Attention for Temporal Consistency in Low-Light Video Enhancement

被引:0
|
作者
Xu, JiaHao [1 ,2 ]
Mei, ShuHao [2 ]
Chen, ZiZheng [2 ]
Zhang, DanNi [2 ]
Shi, Fan [1 ,2 ]
Zhao, Meng [1 ,2 ]
机构
[1] Tianjin Univ Technol, Minist Educ, Engn Res Ctr Learning Based Intelligent Syst, Tianjin 300384, Peoples R China
[2] Tianjin Univ Technol, Sch Comp Sci & Engn, Tianjin 300384, Peoples R China
基金
中国国家自然科学基金;
关键词
Low-Light Video Enhancement; Transformer; Optical flow;
D O I
10.1007/978-981-97-5612-4_3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent advancements in deep learning have significantly impacted low-light video enhancement, sparking great interest in the field. However, while these techniques have proven effective for enhancing individual static images, they struggle with temporal instability when applied to videos, leading to artifacts and flickering. This challenge is further compounded by the difficulty of obtaining dynamic low-light/high-light video pairs in real-world scenarios. Our proposed solution tackles these issues by integrating a cross-attention mechanism with optical flow. This approach helps mitigate temporal inconsistencies, often found when training with static images, by using optical flow to infer motion in individual frames. We have also developed a Transformer model (DSFormer) that leverages spatial and channel features to enhance visual quality and temporal stability in videos. Additionally, we have created a novel dual path feed-forward network (DPFN) that improves our method's ability to capture and maintain local contextual information, which is crucial for low-light enhancement. Through extensive comparative and ablation studies, we demonstrate that our approach delivers high luminance and temporal consistency in enhancement sequences.
引用
收藏
页码:27 / 38
页数:12
相关论文
共 50 条
  • [21] Row-Column Separated Attention Based Low-Light Image/Video Enhancement
    Dong, Chengqi
    Cao, Zhiyuan
    Qi, Tuoshi
    Wu, Kexin
    Gao, Yixing
    Tang, Fan
    COMPUTER GRAPHICS FORUM, 2024, 43 (06)
  • [22] Neural substrates of perceptual enhancement by cross-modal spatial attention
    McDonald, JJ
    Teder-Sälejärvi, WA
    Di Russo, F
    Hillyard, SA
    JOURNAL OF COGNITIVE NEUROSCIENCE, 2003, 15 (01) : 10 - 19
  • [23] Cross-modal decoupling in temporal attention between audition and touch
    Stefanie Mühlberg
    Salvador Soto-Faraco
    Psychological Research, 2019, 83 : 1626 - 1639
  • [24] Cross-modal decoupling in temporal attention between audition and touch
    Muhlberg, Stefanie
    Soto-Faraco, Salvador
    PSYCHOLOGICAL RESEARCH-PSYCHOLOGISCHE FORSCHUNG, 2019, 83 (08): : 1626 - 1639
  • [25] Cross-modal Non-linear Guided Attention and Temporal Coherence in Multi-modal Deep Video Models
    Sahu, Saurabh
    Goyal, Palash
    Ghosh, Shalini
    Lee, Chul
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 313 - 321
  • [26] Learning Temporal Consistency for Low Light Video Enhancement from Single Images
    Zhang, Fan
    Li, Yu
    You, Shaodi
    Fu, Ying
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 4965 - 4974
  • [27] Enhanced Cross-Modal Transformer Model for Video Semantic Similarity Measurement
    Li, Da
    Zhu, Boqing
    Xu, Kele
    Yang, Sen
    Feng, Dawei
    Liu, Bo
    Wang, Huaimin
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2024, 71 (01) : 475 - 479
  • [28] Contrastive Transformer Cross-Modal Hashing for Video-Text Retrieval
    Shen, Xiaobo
    Huang, Qianxin
    Lan, Long
    Zheng, Yuhui
    PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 1227 - 1235
  • [29] Low-Light Light-Field Image Enhancement With Geometry Consistency
    Liu, Deyang
    Li, Zhengqu
    Zheng, Xin
    Ma, Jian
    Feng, Yuming
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT VIII, 2025, 15038 : 455 - 467
  • [30] SNR-Prior Guided Trajectory-Aware Transformer for Low-Light Video Enhancement
    Ye, Jing
    Qiu, Changzhen
    Zhang, Zhiyong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (03) : 1873 - 1885