Cross-level interaction fusion network-based RGB-T semantic segmentation for distant targets

被引:0
|
作者
Chen, Yu [1 ]
Li, Xiang [1 ]
Luan, Chao [2 ]
Hou, Weimin [2 ]
Liu, Haochen [2 ]
Zhu, Zihui [3 ]
Xue, Lian [3 ]
Zhang, Jianqi [1 ]
Liu, Delian [1 ]
Wu, Xin [1 ]
Wei, Linfang [1 ]
Jian, Chaochao [1 ]
Li, Jinze [1 ]
机构
[1] Xidian Univ, Sch Optoelect Engn, Xian 710071, Peoples R China
[2] Beijing Inst Control & Elect Technol, Beijing 100038, Peoples R China
[3] Natl Key Lab Sci & Technol Test Phys & Numer Math, Beijing 100076, Peoples R China
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
Semantic segmentation; Feature fusion; Cross modality; Multi-scale information; Distant object;
D O I
10.1016/j.patcog.2024.111218
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
RGB-T segmentation represents an innovative approach driven by advancements in multispectral detection and is poised to replace traditional RGB segmentation methods. An effective cross-modality feature fusion module is essential for this technology. The precise segmentation of distant objects is another significant challenge. Focused on these two areas, we propose an end-to-end distant object feature fusion network (DOFFNet) for RGB-T segmentation. Initially, we introduce a cross-level interaction fusion strategy (CLIF) and an inter-correlation fusion method (IFFM) in the encoder to enhance multi-scale feature expression and improve fusion accuracy. Subsequently, we propose a residual dense pixel convolution (R-DPC) in the decoder with a trainable upsampling unit that dynamically reconstructs information lost during encoding, particularly for distant objects whose features may vanish after pooling. Experimental results show that our DOFFNet achieves a top mean pixel accuracy of 75.8% and dramatically improves accuracy for four classes, including objects occupying as little as 0.2%-2% of total pixels. This improvement ensures more reliable and effective performance in practical applications, particularly in scenarios where small object detection is critical. Moreover, it demonstrates potential applicability in other fields like medical imaging and remote sensing.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] CGFNet: cross-guided fusion network for RGB-thermal semantic segmentation
    Yanping Fu
    Qiaoqiao Chen
    Haifeng Zhao
    The Visual Computer, 2022, 38 : 3243 - 3252
  • [32] Attention-based fusion network for RGB-D semantic segmentation
    Zhong, Li
    Guo, Chi
    Zhan, Jiao
    Deng, JingYi
    NEUROCOMPUTING, 2024, 608
  • [33] CIFG-Net: Cross-level information fusion and guidance network for Polyp Segmentation
    Li, Weisheng
    Huang, Zhaopeng
    Li, Feiyan
    Zhao, Yinghui
    Zhang, Hongchuan
    COMPUTERS IN BIOLOGY AND MEDICINE, 2024, 169
  • [34] CACFNet: Cross-Modal Attention Cascaded Fusion Network for RGB-T Urban Scene Parsing
    Zhou, Wujie
    Dong, Shaohua
    Fang, Meixin
    Yu, Lu
    IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2024, 9 (01): : 1919 - 1929
  • [35] ABMDRNet: Adaptive-weighted Bi-directional Modality Difference Reduction Network for RGB-T Semantic Segmentation
    Zhang, Qiang
    Zhao, Shenlu
    Luo, Yongjiang
    Zhang, Dingwen
    Huang, Nianchang
    Han, Jungong
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 2633 - 2642
  • [36] BMDENet: Bi-Directional Modality Difference Elimination Network for Few-Shot RGB-T Semantic Segmentation
    Zhao, Ying
    Song, Kechen
    Zhang, Yiming
    Yan, Yunhui
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2023, 70 (11) : 4266 - 4270
  • [37] RGB-D Salient Object Detection Based on Cross-Modal and Cross-Level Feature Fusion
    Peng, Yanbin
    Zhai, Zhinian
    Feng, Mingkun
    IEEE Access, 2024, 12 : 45134 - 45146
  • [38] RGB-D Salient Object Detection Based on Cross-Modal and Cross-Level Feature Fusion
    Peng, Yanbin
    Zhai, Zhinian
    Feng, Mingkun
    IEEE ACCESS, 2024, 12 : 45134 - 45146
  • [39] An improved deep network-based RGB-D semantic segmentation method for indoor scenes
    Jianjun Ni
    Ziru Zhang
    Kang Shen
    Guangyi Tang
    Simon X. Yang
    International Journal of Machine Learning and Cybernetics, 2024, 15 : 589 - 604
  • [40] An improved deep network-based RGB-D semantic segmentation method for indoor scenes
    Ni, Jianjun
    Zhang, Ziru
    Shen, Kang
    Tang, Guangyi
    Yang, Simon X.
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024, 15 (02) : 589 - 604