Mitigating Modality Discrepancies for RGB-T Semantic Segmentation

被引：23

作者：

Zhao, Shenlu ^{[1
,2
]}

Liu, Yichen ^{[1
,2
]}

Jiao, Qiang ^{[1
,2
]}

Zhang, Qiang ^{[1
,2
]}

Han, Jungong ^{[3
]}

机构：

[1] Xidian Univ, Key Lab Elect Equipment Struct Design, Minist Educ, Xian 710071, Shaanxi, Peoples R China

[2] Xidian Univ, Ctr Complex Syst, Sch Mechanoelect Engn, Xian 710071, Shaanxi, Peoples R China

[3] Aberystwyth Univ, Comp Sci Dept, Aberystwyth SY23 3FL, England

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2024年 / 35卷 / 07期

基金：

中国国家自然科学基金;

关键词：

Bridging-then-fusing; contextual information; dataset; modality discrepancy reduction; RGB-T semantic segmentation; NETWORK; CNN;

D O I：

10.1109/TNNLS.2022.3233089

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Semantic segmentation models gain robustness against adverse illumination conditions by taking advantage of complementary information from visible and thermal infrared (RGB-T) images. Despite its importance, most existing RGB-T semantic segmentation models directly adopt primitive fusion strategies, such as elementwise summation, to integrate multimodal features. Such strategies, unfortunately, overlook the modality discrepancies caused by inconsistent unimodal features obtained by two independent feature extractors, thus hindering the exploitation of cross-modal complementary information within the multimodal data. For that, we propose a novel network for RGB-T semantic segmentation, i.e. MDRNet+, which is an improved version of our previous work ABMDRNet. The core of MDRNet+ is a brand new idea, termed the strategy of bridging-then-fusing, which mitigates modality discrepancies before cross-modal feature fusion. Concretely, an improved Modality Discrepancy Reduction (MDR+) subnetwork is designed, which first extracts unimodal features and reduces their modality discrepancies. Afterward, discriminative multimodal features for RGB-T semantic segmentation are adaptively selected and integrated via several channel-weighted fusion (CWF) modules. Furthermore, a multiscale spatial context (MSC) module and a multiscale channel context (MCC) module are presented to effectively capture the contextual information. Finally, we elaborately assemble a challenging RGB-T semantic segmentation dataset, i.e., RTSS, for urban scene understanding to mitigate the lack of well-annotated training data. Comprehensive experiments demonstrate that our proposed model surpasses other state-of-the-art models on the MFNet, PST900, and RTSS datasets remarkably.

引用

页码：9380 / 9394

页数：15

共 50 条

[21] AMNet: Learning to Align Multi-Modality for RGB-T Tracking
Zhang, Tianlu
He, Xiaoyi
Jiao, Qiang
Zhang, Qiang
Han, Jungong
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (08) : 7386 - 7400
[22] Efficient RGB-T Tracking via Cross-Modality Distillation
Zhang, Tianlu
Guo, Hongyuan
Jiao, Qiang
Zhang, Qiang
Han, Jungong
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 5404 - 5413
[23] MMNet: Multi-modal multi-stage network for RGB-T image semantic segmentation
Xin Lan
Xiaojing Gu
Xingsheng Gu
Applied Intelligence, 2022, 52 : 5817 - 5829
[24] Rgb-t semantic segmentation based on cross-operational fusion attention in autonomous driving scenario
Zhang, Jiyou
Zhang, Rongfen
Yuan, Wenhao
Liu, Yuhong
EVOLVING SYSTEMS, 2024, 15 (04) : 1429 - 1440
[25] MMSMCNet: Modal Memory Sharing and Morphological Complementary Networks for RGB-T Urban Scene Semantic Segmentation
Zhou, Wujie
Zhang, Han
Yan, Weiqing
Lin, Weisi
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (12) : 7096 - 7108
[26] Cross-level interaction fusion network-based RGB-T semantic segmentation for distant targets
Chen, Yu
Li, Xiang
Luan, Chao
Hou, Weimin
Liu, Haochen
Zhu, Zihui
Xue, Lian
Zhang, Jianqi
Liu, Delian
Wu, Xin
Wei, Linfang
Jian, Chaochao
Li, Jinze
PATTERN RECOGNITION, 2025, 161
[27] MFGNet: Dynamic Modality-Aware Filter Generation for RGB-T Tracking
Wang, Xiao
Shu, Xiujun
Zhang, Shiliang
Jiang, Bo
Wang, Yaowei
Tian, Yonghong
Wu, Feng
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 4335 - 4348
[28] Learning Modality Complementary Features with Mixed Attention Mechanism for RGB-T Tracking
Luo, Yang
Guo, Xiqing
Dong, Mingtao
Yu, Jin
SENSORS, 2023, 23 (14)
[29] RGB-T tracking by modality difference reduction and feature re-selection
Zhang, Qiang
Liu, Xueru
Zhang, Tianlu
IMAGE AND VISION COMPUTING, 2022, 127
[30] CGINet: Cross-modality grade interaction network for RGB-T crowd counting
Pan, Yi
Zhou, Wujie
Qian, Xiaohong
Mao, Shanshan
Yang, Rongwang
Yu, Lu
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 126

← 1 2 3 4 5 →