Mitigating Modality Discrepancies for RGB-T Semantic Segmentation

被引：23

作者：

Zhao, Shenlu ^{[1
,2
]}

Liu, Yichen ^{[1
,2
]}

Jiao, Qiang ^{[1
,2
]}

Zhang, Qiang ^{[1
,2
]}

Han, Jungong ^{[3
]}

机构：

[1] Xidian Univ, Key Lab Elect Equipment Struct Design, Minist Educ, Xian 710071, Shaanxi, Peoples R China

[2] Xidian Univ, Ctr Complex Syst, Sch Mechanoelect Engn, Xian 710071, Shaanxi, Peoples R China

[3] Aberystwyth Univ, Comp Sci Dept, Aberystwyth SY23 3FL, England

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2024年 / 35卷 / 07期

基金：

中国国家自然科学基金;

关键词：

Bridging-then-fusing; contextual information; dataset; modality discrepancy reduction; RGB-T semantic segmentation; NETWORK; CNN;

D O I：

10.1109/TNNLS.2022.3233089

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Semantic segmentation models gain robustness against adverse illumination conditions by taking advantage of complementary information from visible and thermal infrared (RGB-T) images. Despite its importance, most existing RGB-T semantic segmentation models directly adopt primitive fusion strategies, such as elementwise summation, to integrate multimodal features. Such strategies, unfortunately, overlook the modality discrepancies caused by inconsistent unimodal features obtained by two independent feature extractors, thus hindering the exploitation of cross-modal complementary information within the multimodal data. For that, we propose a novel network for RGB-T semantic segmentation, i.e. MDRNet+, which is an improved version of our previous work ABMDRNet. The core of MDRNet+ is a brand new idea, termed the strategy of bridging-then-fusing, which mitigates modality discrepancies before cross-modal feature fusion. Concretely, an improved Modality Discrepancy Reduction (MDR+) subnetwork is designed, which first extracts unimodal features and reduces their modality discrepancies. Afterward, discriminative multimodal features for RGB-T semantic segmentation are adaptively selected and integrated via several channel-weighted fusion (CWF) modules. Furthermore, a multiscale spatial context (MSC) module and a multiscale channel context (MCC) module are presented to effectively capture the contextual information. Finally, we elaborately assemble a challenging RGB-T semantic segmentation dataset, i.e., RTSS, for urban scene understanding to mitigate the lack of well-annotated training data. Comprehensive experiments demonstrate that our proposed model surpasses other state-of-the-art models on the MFNet, PST900, and RTSS datasets remarkably.

引用

页码：9380 / 9394

页数：15

共 50 条

[41] C4Net: Excavating Cross-Modal Context- and Content-Complementarity for RGB-T Semantic Segmentation
Zhao, Shenlu
Li, Jingyi
Zhang, Qiang
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (02) : 1347 - 1361
[42] Weakly Supervised Instance Segmentation of Electrical Equipment Based on RGB-T Automatic Annotation
Ma, Jiale
Qian, Kun
Zhang, Xiaobo
Ma, Xudong
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2020, 69 (12) : 9720 - 9731
[43] CAFCNet: Cross-modality asymmetric feature complement network for RGB-T salient object detection
Jin, Dongze
Shao, Feng
Xie, Zhengxuan
Mu, Baoyang
Chen, Hangwei
Jiang, Qiuping
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 247
[44] RGB-T object tracking: Benchmark and baseline
Li, Chenglong
Liang, Xinyan
Lu, Yijuan
Zhao, Nan
Tang, Jin
PATTERN RECOGNITION, 2019, 96
[45] Intra-Modality Self-Enhancement Mirror Network for RGB-T Salient Object Detection
Wang, Jie
Li, Guoqiang
Yu, Hongjie
Xi, Jinwen
Shi, Jie
Wu, Xueying
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (03) : 2513 - 2525
[46] Cross-Modality Double Bidirectional Interaction and Fusion Network for RGB-T Salient Object Detection
Xie, Zhengxuan
Shao, Feng
Chen, Gang
Chen, Hangwei
Jiang, Qiuping
Meng, Xiangchao
Ho, Yo-Sung
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (08) : 4149 - 4163
[47] CGMDRNet: Cross-Guided Modality Difference Reduction Network for RGB-T Salient Object Detection
Chen, Gang
Shao, Feng
Chai, Xiongli
Chen, Hangwei
Jiang, Qiuping
Meng, Xiangchao
Ho, Yo-Sung
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (09) : 6308 - 6323
[48] RGB-T tracking with frequency hybrid awareness
Lei, Lei
Li, Xianxian
IMAGE AND VISION COMPUTING, 2024, 152
[49] Toward Modalities Correlation for RGB-T Tracking
Hu, Xiantao
Zhong, Bineng
Liang, Qihua
Zhang, Shengping
Li, Ning
Li, Xianxian
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (10) : 9102 - 9111
[50] RGB-T Object Detection With Failure Scenarios
Wang, Qingwang
Sun, Yuxuan
Chi, Yongke
Shen, Tao
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2025, 18 : 3000 - 3010

← 1 2 3 4 5 →