TFIV: Multigrained Token Fusion for Infrared and Visible Image via Transformer

被引:9
|
作者
Li, Jing [1 ]
Yang, Bin [2 ]
Bai, Lu [3 ,4 ]
Dou, Hao [5 ]
Li, Chang [6 ]
Ma, Lingfei [7 ]
机构
[1] Cent Univ Finance & Econ, Sch Informat, Beijing 102206, Peoples R China
[2] Hunan Univ, Coll Elect & Informat Engn, Changsha 410082, Peoples R China
[3] Beijing Normal Univ, Sch Artificial Intelligence, Beijing 100875, Peoples R China
[4] Cent Univ Finance & Econ, Beijing 100081, Peoples R China
[5] China Elect Technol Grp Corp, Res Inst 38, Hefei 230088, Peoples R China
[6] Hefei Univ Technol, Dept Biomed Engn, Hefei 230009, Peoples R China
[7] Cent Univ Finance & Econ, Sch Stat & Math, Beijing 102206, Peoples R China
基金
中国国家自然科学基金;
关键词
Image fusion; infrared image; transformer; visible image; MULTI-FOCUS; NETWORK; FRAMEWORK;
D O I
10.1109/TIM.2023.3312755
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The existing transformer-based infrared and visible image fusion methods mainly focus on the self-attention correlation existing in the intra-modal of each image; yet these methods neglect the discrepancies of inter-modal in the same position of two source images, because the information of infrared token and visible token in the same position is unbalanced. Therefore, we develop a pure transformer fusion model to reconstruct fused image in token dimension, which not only perceives the long-range dependencies in intra-modal by self-attention mechanism of the transformer, but also captures the attentive correlation of inter-modal in token space. Moreover, to enhance and balance the interaction of inter-modal tokens when we fuse the corresponding infrared and visible tokens, learnable attentive weights are applied to dynamically measure the correlation of inter-modal tokens in the same position. Concretely, infrared and visible tokens are first calculated by two independent transformers to extract long-range dependencies in intra-modal due to their modal difference. Then, we fuse the corresponding infrared and visible tokens of inter-modal in token space to reconstruct the fused image. In addition, to comprehensively extract multiscale long-range dependencies and capture attentive correlation of corresponding multimodal tokens in different token sizes, we explore and extend the fusion to multigrained token-based fusion. Ablation studies and extensive experiments illustrate the effectiveness and superiorities of our model when compared with nine state-of-the-art methods.
引用
收藏
页数:14
相关论文
共 50 条
  • [31] STFuse: Infrared and Visible Image Fusion via Semisupervised Transfer Learning
    Wang, Xue
    Guan, Zheng
    Qian, Wenhua
    Cao, Jinde
    Wang, Chengchao
    Ma, Runzhuo
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (01) : 160 - 173
  • [32] Infrared and visible image fusion via joint convolutional sparse representation
    Wu, Minghui
    Ma, Yong
    Fan, Fan
    Mei, Xiaoguang
    Huang, Jun
    JOURNAL OF THE OPTICAL SOCIETY OF AMERICA A-OPTICS IMAGE SCIENCE AND VISION, 2020, 37 (07) : 1105 - 1115
  • [33] LiMFusion: Infrared and visible image fusion via local information measurement
    Qian, Yao
    Tang, Haojie
    Liu, Gang
    Xiao, Gang
    Bavirisetti, Durga Prasad
    OPTICS AND LASERS IN ENGINEERING, 2024, 181
  • [34] Infrared and visible image fusion via octave Gaussian pyramid framework
    Yan, Lei
    Hao, Qun
    Cao, Jie
    Saad, Rizvi
    Li, Kun
    Yan, Zhengang
    Wu, Zhimin
    SCIENTIFIC REPORTS, 2021, 11 (01)
  • [35] Infrared and visible image fusion via NSCT and gradient domain PCNN
    Zhang Xin
    Wang Caishun
    Chen Getao
    Zhang Jiajia
    Tan Wei
    Li Huan
    Zhou Huixin
    AOPC 2021: OPTICAL SENSING AND IMAGING TECHNOLOGY, 2021, 12065
  • [36] Infrared and visible image fusion via parallel scene and texture learning
    Xu, Meilong
    Tang, Linfeng
    Zhang, Hao
    Ma, Jiayi
    PATTERN RECOGNITION, 2022, 132
  • [37] Infrared and Visible Image Fusion via Test-Time Training
    Zheng, Guoqing
    Fu, Zhenqi
    Lin, Xiaopeng
    Chu, Xueye
    Huang, Yue
    Ding, Xinghao
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT X, 2024, 14434 : 77 - 88
  • [38] Infrared and visible image fusion via octave Gaussian pyramid framework
    Lei Yan
    Qun Hao
    Jie Cao
    Rizvi Saad
    Kun Li
    Zhengang Yan
    Zhimin Wu
    Scientific Reports, 11
  • [39] Infrared and visible image fusion via parallel scene and texture learning
    Xu, Meilong
    Tang, Linfeng
    Zhang, Hao
    Ma, Jiayi
    Pattern Recognition, 2022, 132
  • [40] SIEFusion: Infrared and Visible Image Fusion via Semantic Information Enhancement
    Lv, Guohua
    Song, Wenkuo
    Wei, Zhonghe
    Cheng, Jinyong
    Dong, Aimei
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT III, 2024, 14427 : 176 - 187