Cross-level interaction fusion network-based RGB-T semantic segmentation for distant targets

被引:0
|
作者
Chen, Yu [1 ]
Li, Xiang [1 ]
Luan, Chao [2 ]
Hou, Weimin [2 ]
Liu, Haochen [2 ]
Zhu, Zihui [3 ]
Xue, Lian [3 ]
Zhang, Jianqi [1 ]
Liu, Delian [1 ]
Wu, Xin [1 ]
Wei, Linfang [1 ]
Jian, Chaochao [1 ]
Li, Jinze [1 ]
机构
[1] Xidian Univ, Sch Optoelect Engn, Xian 710071, Peoples R China
[2] Beijing Inst Control & Elect Technol, Beijing 100038, Peoples R China
[3] Natl Key Lab Sci & Technol Test Phys & Numer Math, Beijing 100076, Peoples R China
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
Semantic segmentation; Feature fusion; Cross modality; Multi-scale information; Distant object;
D O I
10.1016/j.patcog.2024.111218
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
RGB-T segmentation represents an innovative approach driven by advancements in multispectral detection and is poised to replace traditional RGB segmentation methods. An effective cross-modality feature fusion module is essential for this technology. The precise segmentation of distant objects is another significant challenge. Focused on these two areas, we propose an end-to-end distant object feature fusion network (DOFFNet) for RGB-T segmentation. Initially, we introduce a cross-level interaction fusion strategy (CLIF) and an inter-correlation fusion method (IFFM) in the encoder to enhance multi-scale feature expression and improve fusion accuracy. Subsequently, we propose a residual dense pixel convolution (R-DPC) in the decoder with a trainable upsampling unit that dynamically reconstructs information lost during encoding, particularly for distant objects whose features may vanish after pooling. Experimental results show that our DOFFNet achieves a top mean pixel accuracy of 75.8% and dramatically improves accuracy for four classes, including objects occupying as little as 0.2%-2% of total pixels. This improvement ensures more reliable and effective performance in practical applications, particularly in scenarios where small object detection is critical. Moreover, it demonstrates potential applicability in other fields like medical imaging and remote sensing.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] CGFNet: Cross-Guided Fusion Network for RGB-T Salient Object Detection
    Wang, Jie
    Song, Kechen
    Bao, Yanqi
    Huang, Liming
    Yan, Yunhui
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (05) : 2949 - 2961
  • [22] CrowdFusion: Refined Cross-Modal Fusion Network for RGB-T Crowd Counting
    Cai, Jialu
    Wang, Qing
    Jiang, Shengqin
    BIOMETRIC RECOGNITION, CCBR 2023, 2023, 14463 : 427 - 436
  • [23] Low-Light Enhancement and Global-Local Feature Interaction for RGB-T Semantic Segmentation
    Guo, Xueyi
    Liu, Yisha
    Xue, Weimin
    Zhang, Zhiwei
    Zhuang, Yan
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2025, 74
  • [24] Progressive cross-level fusion network for RGB-D salient object detection
    Li, Jianbao
    Pan, Chen
    Zheng, Yilin
    Zhang, Dongping
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 104
  • [25] MMNet: Multi-modal multi-stage network for RGB-T image semantic segmentation
    Lan, Xin
    Gu, Xiaojing
    Gu, Xingsheng
    APPLIED INTELLIGENCE, 2022, 52 (05) : 5817 - 5829
  • [26] CGINet: Cross-modality grade interaction network for RGB-T crowd counting
    Pan, Yi
    Zhou, Wujie
    Qian, Xiaohong
    Mao, Shanshan
    Yang, Rongwang
    Yu, Lu
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 126
  • [27] MMNet: Multi-modal multi-stage network for RGB-T image semantic segmentation
    Xin Lan
    Xiaojing Gu
    Xingsheng Gu
    Applied Intelligence, 2022, 52 : 5817 - 5829
  • [28] CFRNet: Cross-Attention-Based Fusion and Refinement Network for Enhanced RGB-T Salient Object Detection
    Deng, Biao
    Liu, Di
    Cao, Yang
    Liu, Hong
    Yan, Zhiguo
    Chen, Hu
    SENSORS, 2024, 24 (22)
  • [29] Transformer-based cross-modality interaction guidance network for RGB-T salient object detection
    Luo, Jincheng
    Li, Yongjun
    Li, Bo
    Zhang, Xinru
    Li, Chaoyue
    Chenjin, Zhimin
    He, Jingyi
    Liang, Yifei
    NEUROCOMPUTING, 2024, 600
  • [30] Cross-modal attention fusion network for RGB-D semantic segmentation
    Zhao, Qiankun
    Wan, Yingcai
    Xu, Jiqian
    Fang, Lijin
    NEUROCOMPUTING, 2023, 548