Transformer-Based Cross-Modal Integration Network for RGB-T Salient Object Detection

被引:2
|
作者
Lv, Chengtao [1 ]
Zhou, Xiaofei [1 ]
Wan, Bin [1 ]
Wang, Shuai [2 ,3 ]
Sun, Yaoqi [1 ,3 ]
Zhang, Jiyong [1 ]
Yan, Chenggang [2 ]
机构
[1] Hangzhou Dianzi Univ, Sch Automat, Hangzhou 310018, Peoples R China
[2] Sch Commun Engn, Hangzhou Dianzi Univ, Hangzhou 310018, Peoples R China
[3] Hangzhou Dianzi Univ, Lishui Inst, Lishui 323000, Peoples R China
基金
中国国家自然科学基金;
关键词
Task analysis; Transformers; Semantics; Decoding; Aggregates; Object detection; Fuses; Salient object detection; collaborative spatial attention; feature interaction; Swin transformer; interactive complement; IMAGE; KERNEL;
D O I
10.1109/TCE.2024.3390841
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Salient object detection (SOD) can be applied to consumer electronic area, which can help to identify and locate objects of interest. RGB/RGB-D (depth) salient object detection has achieved great progress in recent years. However, there is a large room for improvement in exploring the complementarity of two-modal information for RGB-T (thermal) SOD. Therefore, this paper proposes a Transformer-based Cross-modal Integration Network (i.e., TCINet) to detect salient objects in RGB-T images, which can properly fuse two-modal features and interactively aggregate two-level features. Our method consists of the siamese Swin Transformer-based encoders, the cross-modal feature fusion (CFF) module, and the interaction-based feature decoding (IFD) block. Here, the CFF module is designed to fuse the complementary information of two-modal features, where the collaborative spatial attention emphasizes salient regions and suppresses background regions of the two-modal features. Furthermore, we deploy the IFD block to aggregate two-level features, including the previous-level fused feature and the current-level encoder feature, where the IFD block bridges the large semantic gap and reduces the noise. Extensive experiments are conducted on three RGB-T datasets, and the experimental results clearly demonstrate the superiority and effectiveness of our method when compared with the cutting-edge saliency methods. The results and code of our method will be available at https://github.com/lvchengtao/TCINet.
引用
收藏
页码:4741 / 4755
页数:15
相关论文
共 50 条
  • [21] Bidirectional Alternating Fusion Network for RGB-T Salient Object Detection
    Tu, Zhengzheng
    Lin, Danying
    Jiang, Bo
    Gu, Le
    Wang, Kunpeng
    Zhai, Sulan
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT VIII, 2025, 15038 : 34 - 48
  • [22] SIA: RGB-T salient object detection network with salient-illumination awareness
    Song, Kechen
    Wen, Hongwei
    Ji, Yingying
    Xue, Xiaotong
    Huang, Liming
    Yan, Yunhui
    Meng, Qinggang
    OPTICS AND LASERS IN ENGINEERING, 2024, 172
  • [23] TANet: Transformer-based asymmetric network for RGB-D salient object detection
    Liu, Chang
    Yang, Gang
    Wang, Shuo
    Wang, Hangxu
    Zhang, Yunhua
    Wang, Yutao
    IET COMPUTER VISION, 2023, 17 (04) : 415 - 430
  • [24] Transformer-based difference fusion network for RGB-D salient object detection
    Cui, Zhi-Qiang
    Wang, Feng
    Feng, Zheng-Yong
    JOURNAL OF ELECTRONIC IMAGING, 2022, 31 (06)
  • [25] GOSNet: RGB-T salient object detection network based on Global Omnidirectional Scanning
    Jiang, Bochang
    Luo, Dan
    Shang, Zihan
    Liu, Sicheng
    NEUROCOMPUTING, 2025, 630
  • [26] Unified Information Fusion Network for Multi-Modal RGB-D and RGB-T Salient Object Detection
    Gao, Wei
    Liao, Guibiao
    Ma, Siwei
    Li, Ge
    Liang, Yongsheng
    Lin, Weisi
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (04) : 2091 - 2106
  • [27] Cross-Modal Fusion and Progressive Decoding Network for RGB-D Salient Object Detection
    Hu, Xihang
    Sun, Fuming
    Sun, Jing
    Wang, Fasheng
    Li, Haojie
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (08) : 3067 - 3085
  • [28] Depth Enhanced Cross-Modal Cascaded Network for RGB-D Salient Object Detection
    Zhao, Zhengyun
    Huang, Ziqing
    Chai, Xiuli
    Wang, Jun
    NEURAL PROCESSING LETTERS, 2023, 55 (01) : 361 - 384
  • [29] CMIGNet: Cross-Modal Inverse Guidance Network for RGB-Depth salient object detection
    Zhu, Hegui
    Ni, Jia
    Yang, Xi
    Zhang, Libo
    PATTERN RECOGNITION, 2024, 155
  • [30] Depth Enhanced Cross-Modal Cascaded Network for RGB-D Salient Object Detection
    Zhengyun Zhao
    Ziqing Huang
    Xiuli Chai
    Jun Wang
    Neural Processing Letters, 2023, 55 : 361 - 384