Transformer-Based Cross-Modal Integration Network for RGB-T Salient Object Detection

被引：2

作者：

Lv, Chengtao ^{[1
]}

Zhou, Xiaofei ^{[1
]}

Wan, Bin ^{[1
]}

Wang, Shuai ^{[2
,3
]}

Sun, Yaoqi ^{[1
,3
]}

Zhang, Jiyong ^{[1
]}

Yan, Chenggang ^{[2
]}

机构：

[1] Hangzhou Dianzi Univ, Sch Automat, Hangzhou 310018, Peoples R China

[2] Sch Commun Engn, Hangzhou Dianzi Univ, Hangzhou 310018, Peoples R China

[3] Hangzhou Dianzi Univ, Lishui Inst, Lishui 323000, Peoples R China

来源：

IEEE TRANSACTIONS ON CONSUMER ELECTRONICS | 2024年 / 70卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Task analysis; Transformers; Semantics; Decoding; Aggregates; Object detection; Fuses; Salient object detection; collaborative spatial attention; feature interaction; Swin transformer; interactive complement; IMAGE; KERNEL;

D O I：

10.1109/TCE.2024.3390841

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Salient object detection (SOD) can be applied to consumer electronic area, which can help to identify and locate objects of interest. RGB/RGB-D (depth) salient object detection has achieved great progress in recent years. However, there is a large room for improvement in exploring the complementarity of two-modal information for RGB-T (thermal) SOD. Therefore, this paper proposes a Transformer-based Cross-modal Integration Network (i.e., TCINet) to detect salient objects in RGB-T images, which can properly fuse two-modal features and interactively aggregate two-level features. Our method consists of the siamese Swin Transformer-based encoders, the cross-modal feature fusion (CFF) module, and the interaction-based feature decoding (IFD) block. Here, the CFF module is designed to fuse the complementary information of two-modal features, where the collaborative spatial attention emphasizes salient regions and suppresses background regions of the two-modal features. Furthermore, we deploy the IFD block to aggregate two-level features, including the previous-level fused feature and the current-level encoder feature, where the IFD block bridges the large semantic gap and reduces the noise. Extensive experiments are conducted on three RGB-T datasets, and the experimental results clearly demonstrate the superiority and effectiveness of our method when compared with the cutting-edge saliency methods. The results and code of our method will be available at https://github.com/lvchengtao/TCINet.

引用

页码：4741 / 4755

页数：15

共 50 条

[21] Bidirectional Alternating Fusion Network for RGB-T Salient Object Detection
Tu, Zhengzheng
Lin, Danying
Jiang, Bo
Gu, Le
Wang, Kunpeng
Zhai, Sulan
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2024, PT VIII, 2025, 15038 : 34 - 48
[22] SIA: RGB-T salient object detection network with salient-illumination awareness
Song, Kechen
Wen, Hongwei
Ji, Yingying
Xue, Xiaotong
Huang, Liming
Yan, Yunhui
Meng, Qinggang
OPTICS AND LASERS IN ENGINEERING, 2024, 172
[23] TANet: Transformer-based asymmetric network for RGB-D salient object detection
Liu, Chang
Yang, Gang
Wang, Shuo
Wang, Hangxu
Zhang, Yunhua
Wang, Yutao
IET COMPUTER VISION, 2023, 17 (04) : 415 - 430
[24] Transformer-based difference fusion network for RGB-D salient object detection
Cui, Zhi-Qiang
Wang, Feng
Feng, Zheng-Yong
JOURNAL OF ELECTRONIC IMAGING, 2022, 31 (06)
[25] GOSNet: RGB-T salient object detection network based on Global Omnidirectional Scanning
Jiang, Bochang
Luo, Dan
Shang, Zihan
Liu, Sicheng
NEUROCOMPUTING, 2025, 630
[26] Unified Information Fusion Network for Multi-Modal RGB-D and RGB-T Salient Object Detection
Gao, Wei
Liao, Guibiao
Ma, Siwei
Li, Ge
Liang, Yongsheng
Lin, Weisi
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (04) : 2091 - 2106
[27] Cross-Modal Fusion and Progressive Decoding Network for RGB-D Salient Object Detection
Hu, Xihang
Sun, Fuming
Sun, Jing
Wang, Fasheng
Li, Haojie
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (08) : 3067 - 3085
[28] Depth Enhanced Cross-Modal Cascaded Network for RGB-D Salient Object Detection
Zhao, Zhengyun
Huang, Ziqing
Chai, Xiuli
Wang, Jun
NEURAL PROCESSING LETTERS, 2023, 55 (01) : 361 - 384
[29] CMIGNet: Cross-Modal Inverse Guidance Network for RGB-Depth salient object detection
Zhu, Hegui
Ni, Jia
Yang, Xi
Zhang, Libo
PATTERN RECOGNITION, 2024, 155
[30] Depth Enhanced Cross-Modal Cascaded Network for RGB-D Salient Object Detection
Zhengyun Zhao
Ziqing Huang
Xiuli Chai
Jun Wang
Neural Processing Letters, 2023, 55 : 361 - 384

← 1 2 3 4 5 →