Mitigating Modality Discrepancies for RGB-T Semantic Segmentation

被引:23
|
作者
Zhao, Shenlu [1 ,2 ]
Liu, Yichen [1 ,2 ]
Jiao, Qiang [1 ,2 ]
Zhang, Qiang [1 ,2 ]
Han, Jungong [3 ]
机构
[1] Xidian Univ, Key Lab Elect Equipment Struct Design, Minist Educ, Xian 710071, Shaanxi, Peoples R China
[2] Xidian Univ, Ctr Complex Syst, Sch Mechanoelect Engn, Xian 710071, Shaanxi, Peoples R China
[3] Aberystwyth Univ, Comp Sci Dept, Aberystwyth SY23 3FL, England
基金
中国国家自然科学基金;
关键词
Bridging-then-fusing; contextual information; dataset; modality discrepancy reduction; RGB-T semantic segmentation; NETWORK; CNN;
D O I
10.1109/TNNLS.2022.3233089
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Semantic segmentation models gain robustness against adverse illumination conditions by taking advantage of complementary information from visible and thermal infrared (RGB-T) images. Despite its importance, most existing RGB-T semantic segmentation models directly adopt primitive fusion strategies, such as elementwise summation, to integrate multimodal features. Such strategies, unfortunately, overlook the modality discrepancies caused by inconsistent unimodal features obtained by two independent feature extractors, thus hindering the exploitation of cross-modal complementary information within the multimodal data. For that, we propose a novel network for RGB-T semantic segmentation, i.e. MDRNet+, which is an improved version of our previous work ABMDRNet. The core of MDRNet+ is a brand new idea, termed the strategy of bridging-then-fusing, which mitigates modality discrepancies before cross-modal feature fusion. Concretely, an improved Modality Discrepancy Reduction (MDR+) subnetwork is designed, which first extracts unimodal features and reduces their modality discrepancies. Afterward, discriminative multimodal features for RGB-T semantic segmentation are adaptively selected and integrated via several channel-weighted fusion (CWF) modules. Furthermore, a multiscale spatial context (MSC) module and a multiscale channel context (MCC) module are presented to effectively capture the contextual information. Finally, we elaborately assemble a challenging RGB-T semantic segmentation dataset, i.e., RTSS, for urban scene understanding to mitigate the lack of well-annotated training data. Comprehensive experiments demonstrate that our proposed model surpasses other state-of-the-art models on the MFNet, PST900, and RTSS datasets remarkably.
引用
收藏
页码:9380 / 9394
页数:15
相关论文
共 50 条
  • [41] C4Net: Excavating Cross-Modal Context- and Content-Complementarity for RGB-T Semantic Segmentation
    Zhao, Shenlu
    Li, Jingyi
    Zhang, Qiang
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (02) : 1347 - 1361
  • [42] Weakly Supervised Instance Segmentation of Electrical Equipment Based on RGB-T Automatic Annotation
    Ma, Jiale
    Qian, Kun
    Zhang, Xiaobo
    Ma, Xudong
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2020, 69 (12) : 9720 - 9731
  • [43] CAFCNet: Cross-modality asymmetric feature complement network for RGB-T salient object detection
    Jin, Dongze
    Shao, Feng
    Xie, Zhengxuan
    Mu, Baoyang
    Chen, Hangwei
    Jiang, Qiuping
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 247
  • [44] RGB-T object tracking: Benchmark and baseline
    Li, Chenglong
    Liang, Xinyan
    Lu, Yijuan
    Zhao, Nan
    Tang, Jin
    PATTERN RECOGNITION, 2019, 96
  • [45] Intra-Modality Self-Enhancement Mirror Network for RGB-T Salient Object Detection
    Wang, Jie
    Li, Guoqiang
    Yu, Hongjie
    Xi, Jinwen
    Shi, Jie
    Wu, Xueying
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (03) : 2513 - 2525
  • [46] Cross-Modality Double Bidirectional Interaction and Fusion Network for RGB-T Salient Object Detection
    Xie, Zhengxuan
    Shao, Feng
    Chen, Gang
    Chen, Hangwei
    Jiang, Qiuping
    Meng, Xiangchao
    Ho, Yo-Sung
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (08) : 4149 - 4163
  • [47] CGMDRNet: Cross-Guided Modality Difference Reduction Network for RGB-T Salient Object Detection
    Chen, Gang
    Shao, Feng
    Chai, Xiongli
    Chen, Hangwei
    Jiang, Qiuping
    Meng, Xiangchao
    Ho, Yo-Sung
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (09) : 6308 - 6323
  • [48] RGB-T tracking with frequency hybrid awareness
    Lei, Lei
    Li, Xianxian
    IMAGE AND VISION COMPUTING, 2024, 152
  • [49] Toward Modalities Correlation for RGB-T Tracking
    Hu, Xiantao
    Zhong, Bineng
    Liang, Qihua
    Zhang, Shengping
    Li, Ning
    Li, Xianxian
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (10) : 9102 - 9111
  • [50] RGB-T Object Detection With Failure Scenarios
    Wang, Qingwang
    Sun, Yuxuan
    Chi, Yongke
    Shen, Tao
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2025, 18 : 3000 - 3010