Mitigating Modality Discrepancies for RGB-T Semantic Segmentation

被引:23
|
作者
Zhao, Shenlu [1 ,2 ]
Liu, Yichen [1 ,2 ]
Jiao, Qiang [1 ,2 ]
Zhang, Qiang [1 ,2 ]
Han, Jungong [3 ]
机构
[1] Xidian Univ, Key Lab Elect Equipment Struct Design, Minist Educ, Xian 710071, Shaanxi, Peoples R China
[2] Xidian Univ, Ctr Complex Syst, Sch Mechanoelect Engn, Xian 710071, Shaanxi, Peoples R China
[3] Aberystwyth Univ, Comp Sci Dept, Aberystwyth SY23 3FL, England
基金
中国国家自然科学基金;
关键词
Bridging-then-fusing; contextual information; dataset; modality discrepancy reduction; RGB-T semantic segmentation; NETWORK; CNN;
D O I
10.1109/TNNLS.2022.3233089
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Semantic segmentation models gain robustness against adverse illumination conditions by taking advantage of complementary information from visible and thermal infrared (RGB-T) images. Despite its importance, most existing RGB-T semantic segmentation models directly adopt primitive fusion strategies, such as elementwise summation, to integrate multimodal features. Such strategies, unfortunately, overlook the modality discrepancies caused by inconsistent unimodal features obtained by two independent feature extractors, thus hindering the exploitation of cross-modal complementary information within the multimodal data. For that, we propose a novel network for RGB-T semantic segmentation, i.e. MDRNet+, which is an improved version of our previous work ABMDRNet. The core of MDRNet+ is a brand new idea, termed the strategy of bridging-then-fusing, which mitigates modality discrepancies before cross-modal feature fusion. Concretely, an improved Modality Discrepancy Reduction (MDR+) subnetwork is designed, which first extracts unimodal features and reduces their modality discrepancies. Afterward, discriminative multimodal features for RGB-T semantic segmentation are adaptively selected and integrated via several channel-weighted fusion (CWF) modules. Furthermore, a multiscale spatial context (MSC) module and a multiscale channel context (MCC) module are presented to effectively capture the contextual information. Finally, we elaborately assemble a challenging RGB-T semantic segmentation dataset, i.e., RTSS, for urban scene understanding to mitigate the lack of well-annotated training data. Comprehensive experiments demonstrate that our proposed model surpasses other state-of-the-art models on the MFNet, PST900, and RTSS datasets remarkably.
引用
收藏
页码:9380 / 9394
页数:15
相关论文
共 50 条
  • [31] Robust RGB-T Tracking via Adaptive Modality Weight Correlation Filters and Cross-modality Learning
    Zhou, Mingliang
    Zhao, Xinwen
    Luo, Futing
    Luo, Jun
    Pu, Huayan
    Xiang, Tao
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (04)
  • [32] Leveraging modality-specific and shared features for RGB-T salient object detection
    Wang, Shuo
    Yang, Gang
    Xu, Qiqi
    Dai, Xun
    IET COMPUTER VISION, 2024, 18 (08) : 1285 - 1299
  • [33] Semantic-guided fusion for multiple object tracking and RGB-T tracking
    Liu, Xiaohu
    Luo, Yichuang
    Zhang, Yan
    Lei, Zhiyong
    IET IMAGE PROCESSING, 2023, 17 (11) : 3281 - 3291
  • [34] Modality-Induced Transfer-Fusion Network for RGB-D and RGB-T Salient Object Detection
    Chen, Gang
    Shao, Feng
    Chai, Xiongli
    Chen, Hangwei
    Jiang, Qiuping
    Meng, Xiangchao
    Ho, Yo-Sung
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (04) : 1787 - 1801
  • [35] UMINet: a unified multi-modality interaction network for RGB-D and RGB-T salient object detection
    Gao, Lina
    Fu, Ping
    Xu, Mingzhu
    Wang, Tiantian
    Liu, Bing
    VISUAL COMPUTER, 2024, 40 (03): : 1565 - 1582
  • [36] UMINet: a unified multi-modality interaction network for RGB-D and RGB-T salient object detection
    Lina Gao
    Ping Fu
    Mingzhu Xu
    Tiantian Wang
    Bing Liu
    The Visual Computer, 2024, 40 : 1565 - 1582
  • [37] Multiscale Modality-Similar Learning Guided Weakly Supervised RGB-T Crowd Counting
    Kong, Weihang
    Li, He
    Zhao, Fengda
    IEEE SENSORS JOURNAL, 2024, 24 (18) : 29121 - 29134
  • [38] Two-stage modality-graphs regularized manifold ranking for RGB-T tracking
    Li, Chenglong
    Zhu, Chengli
    Zheng, Shaofei
    Luo, Bin
    Tang, Jing
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2018, 68 : 207 - 217
  • [39] RGB-T目标跟踪综述
    丁正彤
    徐磊
    张研
    李飘扬
    李阳阳
    罗斌
    涂铮铮
    南京信息工程大学学报(自然科学版), 2019, 11 (06) : 690 - 697
  • [40] Channel Exchanging for RGB-T Tracking
    Zhao, Long
    Zhu, Meng
    Ren, Honge
    Xue, Lingjixuan
    SENSORS, 2021, 21 (17)