DSFormer: Leveraging Transformer with Cross-Modal Attention for Temporal Consistency in Low-Light Video Enhancement

被引:0
|
作者
Xu, JiaHao [1 ,2 ]
Mei, ShuHao [2 ]
Chen, ZiZheng [2 ]
Zhang, DanNi [2 ]
Shi, Fan [1 ,2 ]
Zhao, Meng [1 ,2 ]
机构
[1] Tianjin Univ Technol, Minist Educ, Engn Res Ctr Learning Based Intelligent Syst, Tianjin 300384, Peoples R China
[2] Tianjin Univ Technol, Sch Comp Sci & Engn, Tianjin 300384, Peoples R China
基金
中国国家自然科学基金;
关键词
Low-Light Video Enhancement; Transformer; Optical flow;
D O I
10.1007/978-981-97-5612-4_3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent advancements in deep learning have significantly impacted low-light video enhancement, sparking great interest in the field. However, while these techniques have proven effective for enhancing individual static images, they struggle with temporal instability when applied to videos, leading to artifacts and flickering. This challenge is further compounded by the difficulty of obtaining dynamic low-light/high-light video pairs in real-world scenarios. Our proposed solution tackles these issues by integrating a cross-attention mechanism with optical flow. This approach helps mitigate temporal inconsistencies, often found when training with static images, by using optical flow to infer motion in individual frames. We have also developed a Transformer model (DSFormer) that leverages spatial and channel features to enhance visual quality and temporal stability in videos. Additionally, we have created a novel dual path feed-forward network (DPFN) that improves our method's ability to capture and maintain local contextual information, which is crucial for low-light enhancement. Through extensive comparative and ablation studies, we demonstrate that our approach delivers high luminance and temporal consistency in enhancement sequences.
引用
收藏
页码:27 / 38
页数:12
相关论文
共 50 条
  • [41] Patch-Based Transformer for Low-Light Image Enhancement
    Zhang, Yu
    Jiang, Shan
    Tang, Xiangyun
    2023 IEEE INTERNATIONAL CONFERENCES ON INTERNET OF THINGS, ITHINGS IEEE GREEN COMPUTING AND COMMUNICATIONS, GREENCOM IEEE CYBER, PHYSICAL AND SOCIAL COMPUTING, CPSCOM IEEE SMART DATA, SMARTDATA AND IEEE CONGRESS ON CYBERMATICS,CYBERMATICS, 2024, : 268 - 273
  • [42] Pre-trained low-light image enhancement transformer
    Zhang, Jingyao
    Hao, Shijie
    Rao, Yuan
    IET IMAGE PROCESSING, 2024, 18 (08) : 1967 - 1984
  • [43] Hierarchical Cross-Modal Graph Consistency Learning for Video-Text Retrieval
    Jin, Weike
    Zhao, Zhou
    Zhang, Pengcheng
    Zhu, Jieming
    He, Xiuqiang
    Zhuang, Yueting
    SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 1114 - 1124
  • [44] Rethinking Low-Light Enhancement via Transformer-GAN
    Yang, Shaoliang
    Zhou, Dongming
    Cao, Jinde
    Guo, Yanbu
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 1082 - 1086
  • [45] Low-light image enhancement based on Transformer and CNN architecture
    Chen, Keyuan
    Chen, Bin
    Wu, Shiqian
    2023 35TH CHINESE CONTROL AND DECISION CONFERENCE, CCDC, 2023, : 3628 - 3633
  • [46] Temporal Cross-Modal Attention for Audio-Visual Event Localization
    Nagasaki Y.
    Hayashi M.
    Kaneko N.
    Aoki Y.
    Seimitsu Kogaku Kaishi/Journal of the Japan Society for Precision Engineering, 2022, 88 (03): : 263 - 268
  • [47] CRET: Cross-Modal Retrieval Transformer for Efficient Text-Video Retrieval
    Ji, Kaixiang
    Liu, Jiajia
    Hong, Weixiang
    Zhong, Liheng
    Wang, Jian
    Chen, Jingdong
    Chu, Wei
    PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 949 - 959
  • [48] Artifact-Free Low-Light Video Enhancement Using Temporal Similarity and Guide Map
    Ko, Seungyong
    Yu, Soohwan
    Kang, Wonseok
    Park, Chanyong
    Lee, Sangkeun
    Paik, Joonki
    IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2017, 64 (08) : 6392 - 6401
  • [49] CMMT: Cross-Modal Meta-Transformer for Video-Text Retrieval
    Gao, Yizhao
    Lu, Zhiwu
    PROCEEDINGS OF THE 2023 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2023, 2023, : 76 - 84
  • [50] Low-Rank HOCA: Efficient High-Order Cross-Modal Attention for Video Captioning
    Jin, Tao
    Huang, Siyu
    Li, Yingming
    Zhang, Zhongfei
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 2001 - 2011