Improved Multi-modal Image Fusion with Attention and Dense Networks: Visual and Quantitative Evaluation

被引：0

作者：

Banerjee, Ankan ^{[1
]}

Patra, Dipti ^{[1
]}

Roy, Pradipta ^{[2
]}

机构：

[1] Natl Inst Technol, Rourkela, India

[2] DRDO, Integrated Test Range, Candipur, India

来源：

COMPUTER VISION AND IMAGE PROCESSING, CVIP 2023, PT III | 2024年 / 2011卷

关键词：

image fusion; attention; human perception; Convolutional Block Attention Module;

D O I：

10.1007/978-3-031-58535-7_20

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This article introduces a novel multi-modal image fusion approach based on Convolutional Block Attention Module and dense networks to enhance human perceptual quality and information content in the fused images. The proposed model preserves the edges of the infrared images and enhances the contrast of the visible image as a pre-processing part. Consequently, the use of Convolutional Block Attention Module has resulted in the extraction of more refined features from the source images. The visual results demonstrate that the fused images produced by the proposed method are visually superior to those generated by most standard fusion techniques. To substantiate the findings, quantitative analysis is conducted using various metrics. The proposed method exhibits the best Naturalness Image Quality Evaluator and Chen-Varshney metric values, which are human perception-based parameters. Moreover, the fused images exhibit the highest Standard Deviation value, signifying enhanced contrast. These results justify the proposed multi-modal image fusion technique outperforms standard methods both qualitatively and quantitatively, resulting in superior fused images with improved human perception quality.

引用

页码：237 / 248

页数：12

共 50 条

[1] Multi-Modal fusion with multi-level attention for Visual Dialog
Zhang, Jingping
Wang, Qiang
Han, Yahong
INFORMATION PROCESSING & MANAGEMENT, 2020, 57 (04)
[2] The multi-modal fusion in visual question answering: a review of attention mechanisms
Lu, Siyu
Liu, Mingzhe
Yin, Lirong
Yin, Zhengtong
Liu, Xuan
Zheng, Wenfeng
PEERJ COMPUTER SCIENCE, 2023, 9
[3] Multi-modal spatial relational attention networks for visual question answering
Yao, Haibo
Wang, Lipeng
Cai, Chengtao
Sun, Yuxin
Zhang, Zhi
Luo, Yongkang
IMAGE AND VISION COMPUTING, 2023, 140
[4] Multi-Modal Explicit Sparse Attention Networks for Visual Question Answering
Guo, Zihan
Han, Dezhi
SENSORS, 2020, 20 (23) : 1 - 15
[5] Multi-head attention fusion networks for multi-modal speech emotion recognition
Zhang, Junfeng
Xing, Lining
Tan, Zhen
Wang, Hongsen
Wang, Kesheng
COMPUTERS & INDUSTRIAL ENGINEERING, 2022, 168
[6] MixFuse: An iterative mix-attention transformer for multi-modal image fusion
Li, Jinfu
Song, Hong
Liu, Lei
Li, Yanan
Xia, Jianghan
Huang, Yuqi
Fan, Jingfan
Lin, Yucong
Yang, Jian
EXPERT SYSTEMS WITH APPLICATIONS, 2025, 261
[7] Multi-modal co-attention relation networks for visual question answering
Guo, Zihan
Han, Dezhi
VISUAL COMPUTER, 2023, 39 (11): : 5783 - 5795
[8] Multi-modal co-attention relation networks for visual question answering
Zihan Guo
Dezhi Han
The Visual Computer, 2023, 39 : 5783 - 5795
[9] Improved Sentiment Classification by Multi-modal Fusion
Gan, Lige
Benlamri, Rachid
Khoury, Richard
2017 THIRD IEEE INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS (IEEE BIGDATASERVICE 2017), 2017, : 11 - 16
[10] Cross-modal attention for multi-modal image registration
Song, Xinrui
Chao, Hanqing
Xu, Xuanang
Guo, Hengtao
Xu, Sheng
Turkbey, Baris
Wood, Bradford J.
Sanford, Thomas
Wang, Ge
Yan, Pingkun
MEDICAL IMAGE ANALYSIS, 2022, 82

← 1 2 3 4 5 →