TFIV: Multigrained Token Fusion for Infrared and Visible Image via Transformer

被引:9
|
作者
Li, Jing [1 ]
Yang, Bin [2 ]
Bai, Lu [3 ,4 ]
Dou, Hao [5 ]
Li, Chang [6 ]
Ma, Lingfei [7 ]
机构
[1] Cent Univ Finance & Econ, Sch Informat, Beijing 102206, Peoples R China
[2] Hunan Univ, Coll Elect & Informat Engn, Changsha 410082, Peoples R China
[3] Beijing Normal Univ, Sch Artificial Intelligence, Beijing 100875, Peoples R China
[4] Cent Univ Finance & Econ, Beijing 100081, Peoples R China
[5] China Elect Technol Grp Corp, Res Inst 38, Hefei 230088, Peoples R China
[6] Hefei Univ Technol, Dept Biomed Engn, Hefei 230009, Peoples R China
[7] Cent Univ Finance & Econ, Sch Stat & Math, Beijing 102206, Peoples R China
基金
中国国家自然科学基金;
关键词
Image fusion; infrared image; transformer; visible image; MULTI-FOCUS; NETWORK; FRAMEWORK;
D O I
10.1109/TIM.2023.3312755
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The existing transformer-based infrared and visible image fusion methods mainly focus on the self-attention correlation existing in the intra-modal of each image; yet these methods neglect the discrepancies of inter-modal in the same position of two source images, because the information of infrared token and visible token in the same position is unbalanced. Therefore, we develop a pure transformer fusion model to reconstruct fused image in token dimension, which not only perceives the long-range dependencies in intra-modal by self-attention mechanism of the transformer, but also captures the attentive correlation of inter-modal in token space. Moreover, to enhance and balance the interaction of inter-modal tokens when we fuse the corresponding infrared and visible tokens, learnable attentive weights are applied to dynamically measure the correlation of inter-modal tokens in the same position. Concretely, infrared and visible tokens are first calculated by two independent transformers to extract long-range dependencies in intra-modal due to their modal difference. Then, we fuse the corresponding infrared and visible tokens of inter-modal in token space to reconstruct the fused image. In addition, to comprehensively extract multiscale long-range dependencies and capture attentive correlation of corresponding multimodal tokens in different token sizes, we explore and extend the fusion to multigrained token-based fusion. Ablation studies and extensive experiments illustrate the effectiveness and superiorities of our model when compared with nine state-of-the-art methods.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] Infrared and Visible Image Fusion via Hybrid Variational Model
    Xia, Zhengwei
    Liu, Yun
    Wang, Xiaoyun
    Zhang, Feiyun
    Chen, Rui
    Jiang, Weiwei
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2024, E107D (04) : 569 - 573
  • [22] Infrared and visible image fusion via mutual information maximization
    Fang, Aiqing
    Wu, Junsheng
    Li, Ying
    Qiao, Ruimin
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 231
  • [23] Infrared and visible image fusion via global variable consensus
    Shen, Donghao
    Zareapoor, Masoumeh
    Yang, Jie
    IMAGE AND VISION COMPUTING, 2020, 104
  • [24] HDCCT: Hybrid Densely Connected CNN and Transformer for Infrared and Visible Image Fusion
    Li, Xue
    He, Hui
    Shi, Jin
    ELECTRONICS, 2024, 13 (17)
  • [25] HATF: Multi-Modal Feature Learning for Infrared and Visible Image Fusion via Hybrid Attention Transformer
    Liu, Xiangzeng
    Wang, Ziyao
    Gao, Haojie
    Li, Xiang
    Wang, Lei
    Miao, Qiguang
    REMOTE SENSING, 2024, 16 (05)
  • [26] DGLT-Fusion: A decoupled global-local infrared and visible image fusion transformer
    Yang, Xin
    Huo, Hongtao
    Wang, Renhua
    Li, Chang
    Liu, Xiaowen
    Li, Jing
    INFRARED PHYSICS & TECHNOLOGY, 2023, 128
  • [27] MATCNN: Infrared and Visible Image Fusion Method Based on Multiscale CNN With Attention Transformer
    Liu, Jingjing
    Zhang, Li
    Zeng, Xiaoyang
    Liu, Wanquan
    Zhang, Jianhua
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2025, 74
  • [28] SDRSwin: A Residual Swin Transformer Network with Saliency Detection for Infrared and Visible Image Fusion
    Li, Shengshi
    Wang, Guanjun
    Zhang, Hui
    Zou, Yonghua
    REMOTE SENSING, 2023, 15 (18)
  • [29] THFuse: An infrared and visible image fusion network using transformer and hybrid feature extractor
    Chen, Jun
    Ding, Jianfeng
    Yu, Yang
    Gong, Wenping
    NEUROCOMPUTING, 2023, 527 : 71 - 82
  • [30] FAFusion: Learning for Infrared and Visible Image Fusion via Frequency Awareness
    Xiao, Guobao
    Tang, Zhimin
    Guo, Hanlin
    Yu, Jun
    Shen, Heng Tao
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2024, 73 : 1 - 11