Cross-level interaction fusion network-based RGB-T semantic segmentation for distant targets

被引:0
|
作者
Chen, Yu [1 ]
Li, Xiang [1 ]
Luan, Chao [2 ]
Hou, Weimin [2 ]
Liu, Haochen [2 ]
Zhu, Zihui [3 ]
Xue, Lian [3 ]
Zhang, Jianqi [1 ]
Liu, Delian [1 ]
Wu, Xin [1 ]
Wei, Linfang [1 ]
Jian, Chaochao [1 ]
Li, Jinze [1 ]
机构
[1] Xidian Univ, Sch Optoelect Engn, Xian 710071, Peoples R China
[2] Beijing Inst Control & Elect Technol, Beijing 100038, Peoples R China
[3] Natl Key Lab Sci & Technol Test Phys & Numer Math, Beijing 100076, Peoples R China
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
Semantic segmentation; Feature fusion; Cross modality; Multi-scale information; Distant object;
D O I
10.1016/j.patcog.2024.111218
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
RGB-T segmentation represents an innovative approach driven by advancements in multispectral detection and is poised to replace traditional RGB segmentation methods. An effective cross-modality feature fusion module is essential for this technology. The precise segmentation of distant objects is another significant challenge. Focused on these two areas, we propose an end-to-end distant object feature fusion network (DOFFNet) for RGB-T segmentation. Initially, we introduce a cross-level interaction fusion strategy (CLIF) and an inter-correlation fusion method (IFFM) in the encoder to enhance multi-scale feature expression and improve fusion accuracy. Subsequently, we propose a residual dense pixel convolution (R-DPC) in the decoder with a trainable upsampling unit that dynamically reconstructs information lost during encoding, particularly for distant objects whose features may vanish after pooling. Experimental results show that our DOFFNet achieves a top mean pixel accuracy of 75.8% and dramatically improves accuracy for four classes, including objects occupying as little as 0.2%-2% of total pixels. This improvement ensures more reliable and effective performance in practical applications, particularly in scenarios where small object detection is critical. Moreover, it demonstrates potential applicability in other fields like medical imaging and remote sensing.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] DBCNet: Dynamic Bilateral Cross-Fusion Network for RGB-T Urban Scene Understanding in Intelligent Vehicles
    Zhou, Wujie
    Gong, Tingting
    Lei, Jingsheng
    Yu, Lu
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2023, 53 (12): : 7631 - 7641
  • [42] Colorectal polyp segmentation method based on fusion of transformer and cross-level phase awareness
    Liang L.
    He A.
    Zhu C.
    Sheng X.
    Shengwu Yixue Gongchengxue Zazhi/Journal of Biomedical Engineering, 2023, 40 (02): : 234 - 243
  • [43] Attention-aware Cross-modal Cross-level Fusion Network for RGB-D Salient Object Detection
    Chen, Hao
    Li, You-Fu
    Su, Dan
    2018 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2018, : 6821 - 6826
  • [44] Feature Pyramid Network Based Spatial Attention and Cross-Level Semantic Similarity for Diseases Segmentation From Capsule Endoscopy Images
    Charfi, Said
    EL Ansari, Mohamed
    Koutti, Lahcen
    Eljaafari, Ilyas
    Ellahyani, Ayoub
    INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, 2024, 34 (06)
  • [45] C4Net: Excavating Cross-Modal Context- and Content-Complementarity for RGB-T Semantic Segmentation
    Zhao, Shenlu
    Li, Jingyi
    Zhang, Qiang
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (02) : 1347 - 1361
  • [46] Transformer-Based Cross-Modal Integration Network for RGB-T Salient Object Detection
    Lv, Chengtao
    Zhou, Xiaofei
    Wan, Bin
    Wang, Shuai
    Sun, Yaoqi
    Zhang, Jiyong
    Yan, Chenggang
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2024, 70 (02) : 4741 - 4755
  • [47] CGFNet: cross-guided fusion network for RGB-thermal semantic segmentation CGI PaperID: 105
    Fu, Yanping
    Chen, Qiaoqiao
    Zhao, Haifeng
    VISUAL COMPUTER, 2022, 38 (9-10): : 3243 - 3252
  • [48] DBCAN: DFormer-Based Cross-Attention Network for RGB Depth Semantic Segmentation
    Wu, Aihua
    Fu, Liuxu
    APPLIED SCIENCES-BASEL, 2024, 14 (18):
  • [49] Three-stream RGB-D salient object detection network based on cross-level and cross-modal dual-attention fusion
    Meng, Lingbing
    Yuan, Mengya
    Shi, Xuehan
    Liu, Qingqing
    Cheng, Fei
    Li, Lingli
    IET IMAGE PROCESSING, 2023, 17 (11) : 3292 - 3308
  • [50] A Cross-Modal Feature Fusion Model Based on ConvNeXt for RGB-D Semantic Segmentation
    Tang, Xiaojiang
    Li, Baoxia
    Guo, Junwei
    Chen, Wenzhuo
    Zhang, Dan
    Huang, Feng
    MATHEMATICS, 2023, 11 (08)