Improved Multi-modal Image Fusion with Attention and Dense Networks: Visual and Quantitative Evaluation

被引:0
|
作者
Banerjee, Ankan [1 ]
Patra, Dipti [1 ]
Roy, Pradipta [2 ]
机构
[1] Natl Inst Technol, Rourkela, India
[2] DRDO, Integrated Test Range, Candipur, India
关键词
image fusion; attention; human perception; Convolutional Block Attention Module;
D O I
10.1007/978-3-031-58535-7_20
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This article introduces a novel multi-modal image fusion approach based on Convolutional Block Attention Module and dense networks to enhance human perceptual quality and information content in the fused images. The proposed model preserves the edges of the infrared images and enhances the contrast of the visible image as a pre-processing part. Consequently, the use of Convolutional Block Attention Module has resulted in the extraction of more refined features from the source images. The visual results demonstrate that the fused images produced by the proposed method are visually superior to those generated by most standard fusion techniques. To substantiate the findings, quantitative analysis is conducted using various metrics. The proposed method exhibits the best Naturalness Image Quality Evaluator and Chen-Varshney metric values, which are human perception-based parameters. Moreover, the fused images exhibit the highest Standard Deviation value, signifying enhanced contrast. These results justify the proposed multi-modal image fusion technique outperforms standard methods both qualitatively and quantitatively, resulting in superior fused images with improved human perception quality.
引用
收藏
页码:237 / 248
页数:12
相关论文
共 50 条
  • [1] Multi-Modal fusion with multi-level attention for Visual Dialog
    Zhang, Jingping
    Wang, Qiang
    Han, Yahong
    INFORMATION PROCESSING & MANAGEMENT, 2020, 57 (04)
  • [2] The multi-modal fusion in visual question answering: a review of attention mechanisms
    Lu, Siyu
    Liu, Mingzhe
    Yin, Lirong
    Yin, Zhengtong
    Liu, Xuan
    Zheng, Wenfeng
    PEERJ COMPUTER SCIENCE, 2023, 9
  • [3] Multi-modal spatial relational attention networks for visual question answering
    Yao, Haibo
    Wang, Lipeng
    Cai, Chengtao
    Sun, Yuxin
    Zhang, Zhi
    Luo, Yongkang
    IMAGE AND VISION COMPUTING, 2023, 140
  • [4] Multi-Modal Explicit Sparse Attention Networks for Visual Question Answering
    Guo, Zihan
    Han, Dezhi
    SENSORS, 2020, 20 (23) : 1 - 15
  • [5] Multi-head attention fusion networks for multi-modal speech emotion recognition
    Zhang, Junfeng
    Xing, Lining
    Tan, Zhen
    Wang, Hongsen
    Wang, Kesheng
    COMPUTERS & INDUSTRIAL ENGINEERING, 2022, 168
  • [6] MixFuse: An iterative mix-attention transformer for multi-modal image fusion
    Li, Jinfu
    Song, Hong
    Liu, Lei
    Li, Yanan
    Xia, Jianghan
    Huang, Yuqi
    Fan, Jingfan
    Lin, Yucong
    Yang, Jian
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 261
  • [7] Multi-modal co-attention relation networks for visual question answering
    Guo, Zihan
    Han, Dezhi
    VISUAL COMPUTER, 2023, 39 (11): : 5783 - 5795
  • [8] Multi-modal co-attention relation networks for visual question answering
    Zihan Guo
    Dezhi Han
    The Visual Computer, 2023, 39 : 5783 - 5795
  • [9] Improved Sentiment Classification by Multi-modal Fusion
    Gan, Lige
    Benlamri, Rachid
    Khoury, Richard
    2017 THIRD IEEE INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS (IEEE BIGDATASERVICE 2017), 2017, : 11 - 16
  • [10] Cross-modal attention for multi-modal image registration
    Song, Xinrui
    Chao, Hanqing
    Xu, Xuanang
    Guo, Hengtao
    Xu, Sheng
    Turkbey, Baris
    Wood, Bradford J.
    Sanford, Thomas
    Wang, Ge
    Yan, Pingkun
    MEDICAL IMAGE ANALYSIS, 2022, 82