Mitigating Modality Discrepancies for RGB-T Semantic Segmentation

被引:23
|
作者
Zhao, Shenlu [1 ,2 ]
Liu, Yichen [1 ,2 ]
Jiao, Qiang [1 ,2 ]
Zhang, Qiang [1 ,2 ]
Han, Jungong [3 ]
机构
[1] Xidian Univ, Key Lab Elect Equipment Struct Design, Minist Educ, Xian 710071, Shaanxi, Peoples R China
[2] Xidian Univ, Ctr Complex Syst, Sch Mechanoelect Engn, Xian 710071, Shaanxi, Peoples R China
[3] Aberystwyth Univ, Comp Sci Dept, Aberystwyth SY23 3FL, England
基金
中国国家自然科学基金;
关键词
Bridging-then-fusing; contextual information; dataset; modality discrepancy reduction; RGB-T semantic segmentation; NETWORK; CNN;
D O I
10.1109/TNNLS.2022.3233089
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Semantic segmentation models gain robustness against adverse illumination conditions by taking advantage of complementary information from visible and thermal infrared (RGB-T) images. Despite its importance, most existing RGB-T semantic segmentation models directly adopt primitive fusion strategies, such as elementwise summation, to integrate multimodal features. Such strategies, unfortunately, overlook the modality discrepancies caused by inconsistent unimodal features obtained by two independent feature extractors, thus hindering the exploitation of cross-modal complementary information within the multimodal data. For that, we propose a novel network for RGB-T semantic segmentation, i.e. MDRNet+, which is an improved version of our previous work ABMDRNet. The core of MDRNet+ is a brand new idea, termed the strategy of bridging-then-fusing, which mitigates modality discrepancies before cross-modal feature fusion. Concretely, an improved Modality Discrepancy Reduction (MDR+) subnetwork is designed, which first extracts unimodal features and reduces their modality discrepancies. Afterward, discriminative multimodal features for RGB-T semantic segmentation are adaptively selected and integrated via several channel-weighted fusion (CWF) modules. Furthermore, a multiscale spatial context (MSC) module and a multiscale channel context (MCC) module are presented to effectively capture the contextual information. Finally, we elaborately assemble a challenging RGB-T semantic segmentation dataset, i.e., RTSS, for urban scene understanding to mitigate the lack of well-annotated training data. Comprehensive experiments demonstrate that our proposed model surpasses other state-of-the-art models on the MFNet, PST900, and RTSS datasets remarkably.
引用
收藏
页码:9380 / 9394
页数:15
相关论文
共 50 条
  • [21] AMNet: Learning to Align Multi-Modality for RGB-T Tracking
    Zhang, Tianlu
    He, Xiaoyi
    Jiao, Qiang
    Zhang, Qiang
    Han, Jungong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (08) : 7386 - 7400
  • [22] Efficient RGB-T Tracking via Cross-Modality Distillation
    Zhang, Tianlu
    Guo, Hongyuan
    Jiao, Qiang
    Zhang, Qiang
    Han, Jungong
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 5404 - 5413
  • [23] MMNet: Multi-modal multi-stage network for RGB-T image semantic segmentation
    Xin Lan
    Xiaojing Gu
    Xingsheng Gu
    Applied Intelligence, 2022, 52 : 5817 - 5829
  • [24] Rgb-t semantic segmentation based on cross-operational fusion attention in autonomous driving scenario
    Zhang, Jiyou
    Zhang, Rongfen
    Yuan, Wenhao
    Liu, Yuhong
    EVOLVING SYSTEMS, 2024, 15 (04) : 1429 - 1440
  • [25] MMSMCNet: Modal Memory Sharing and Morphological Complementary Networks for RGB-T Urban Scene Semantic Segmentation
    Zhou, Wujie
    Zhang, Han
    Yan, Weiqing
    Lin, Weisi
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (12) : 7096 - 7108
  • [26] Cross-level interaction fusion network-based RGB-T semantic segmentation for distant targets
    Chen, Yu
    Li, Xiang
    Luan, Chao
    Hou, Weimin
    Liu, Haochen
    Zhu, Zihui
    Xue, Lian
    Zhang, Jianqi
    Liu, Delian
    Wu, Xin
    Wei, Linfang
    Jian, Chaochao
    Li, Jinze
    PATTERN RECOGNITION, 2025, 161
  • [27] MFGNet: Dynamic Modality-Aware Filter Generation for RGB-T Tracking
    Wang, Xiao
    Shu, Xiujun
    Zhang, Shiliang
    Jiang, Bo
    Wang, Yaowei
    Tian, Yonghong
    Wu, Feng
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 4335 - 4348
  • [28] Learning Modality Complementary Features with Mixed Attention Mechanism for RGB-T Tracking
    Luo, Yang
    Guo, Xiqing
    Dong, Mingtao
    Yu, Jin
    SENSORS, 2023, 23 (14)
  • [29] RGB-T tracking by modality difference reduction and feature re-selection
    Zhang, Qiang
    Liu, Xueru
    Zhang, Tianlu
    IMAGE AND VISION COMPUTING, 2022, 127
  • [30] CGINet: Cross-modality grade interaction network for RGB-T crowd counting
    Pan, Yi
    Zhou, Wujie
    Qian, Xiaohong
    Mao, Shanshan
    Yang, Rongwang
    Yu, Lu
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 126