Transformer-Based Cross-Modal Integration Network for RGB-T Salient Object Detection

被引：2

作者：

Lv, Chengtao ^{[1
]}

Zhou, Xiaofei ^{[1
]}

Wan, Bin ^{[1
]}

Wang, Shuai ^{[2
,3
]}

Sun, Yaoqi ^{[1
,3
]}

Zhang, Jiyong ^{[1
]}

Yan, Chenggang ^{[2
]}

机构：

[1] Hangzhou Dianzi Univ, Sch Automat, Hangzhou 310018, Peoples R China

[2] Sch Commun Engn, Hangzhou Dianzi Univ, Hangzhou 310018, Peoples R China

[3] Hangzhou Dianzi Univ, Lishui Inst, Lishui 323000, Peoples R China

来源：

IEEE TRANSACTIONS ON CONSUMER ELECTRONICS | 2024年 / 70卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Task analysis; Transformers; Semantics; Decoding; Aggregates; Object detection; Fuses; Salient object detection; collaborative spatial attention; feature interaction; Swin transformer; interactive complement; IMAGE; KERNEL;

D O I：

10.1109/TCE.2024.3390841

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Salient object detection (SOD) can be applied to consumer electronic area, which can help to identify and locate objects of interest. RGB/RGB-D (depth) salient object detection has achieved great progress in recent years. However, there is a large room for improvement in exploring the complementarity of two-modal information for RGB-T (thermal) SOD. Therefore, this paper proposes a Transformer-based Cross-modal Integration Network (i.e., TCINet) to detect salient objects in RGB-T images, which can properly fuse two-modal features and interactively aggregate two-level features. Our method consists of the siamese Swin Transformer-based encoders, the cross-modal feature fusion (CFF) module, and the interaction-based feature decoding (IFD) block. Here, the CFF module is designed to fuse the complementary information of two-modal features, where the collaborative spatial attention emphasizes salient regions and suppresses background regions of the two-modal features. Furthermore, we deploy the IFD block to aggregate two-level features, including the previous-level fused feature and the current-level encoder feature, where the IFD block bridges the large semantic gap and reduces the noise. Extensive experiments are conducted on three RGB-T datasets, and the experimental results clearly demonstrate the superiority and effectiveness of our method when compared with the cutting-edge saliency methods. The results and code of our method will be available at https://github.com/lvchengtao/TCINet.

引用

页码：4741 / 4755

页数：15

共 50 条

[41] Global Guided Cross-Modal Cross-Scale Network for RGB-D Salient Object Detection
Wang, Shuaihui
Jiang, Fengyi
Xu, Boqian
SENSORS, 2023, 23 (16)
[42] CrowdFusion: Refined Cross-Modal Fusion Network for RGB-T Crowd Counting
Cai, Jialu
Wang, Qing
Jiang, Shengqin
BIOMETRIC RECOGNITION, CCBR 2023, 2023, 14463 : 427 - 436
[43] Multi-scale Cross-Modal Transformer Network for RGB-D Object Detection
Xiao, Zhibin
Xie, Pengwei
Wang, Guijin
MULTIMEDIA MODELING (MMM 2022), PT I, 2022, 13141 : 352 - 363
[44] Cross-modal refined adjacent-guided network for RGB-D salient object detection
Bi H.
Zhang J.
Wu R.
Tong Y.
Jin W.
Multimedia Tools Appl, 24 (37453-37478): : 37453 - 37478
[45] Cross-Modality Double Bidirectional Interaction and Fusion Network for RGB-T Salient Object Detection
Xie, Zhengxuan
Shao, Feng
Chen, Gang
Chen, Hangwei
Jiang, Qiuping
Meng, Xiangchao
Ho, Yo-Sung
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (08) : 4149 - 4163
[46] CGMDRNet: Cross-Guided Modality Difference Reduction Network for RGB-T Salient Object Detection
Chen, Gang
Shao, Feng
Chai, Xiongli
Chen, Hangwei
Jiang, Qiuping
Meng, Xiangchao
Ho, Yo-Sung
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (09) : 6308 - 6323
[47] Multi-level cross-modal interaction network for RGB-D salient object detection
Huang, Zhou
Chen, Huai-Xin
Zhou, Tao
Yang, Yun-Zhi
Liu, Bi-Yuan
NEUROCOMPUTING, 2021, 452 : 200 - 211
[48] FEATURE ENHANCEMENT AND FUSION FOR RGB-T SALIENT OBJECT DETECTION
Sun, Fengming
Zhang, Kang
Yuan, Xia
Zhao, Chunxia
2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 1300 - 1304
[49] Revisiting Feature Fusion for RGB-T Salient Object Detection
Zhang, Qiang
Xiao, Tonglin
Huang, Nianchang
Zhang, Dingwen
Han, Jungong
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (05) : 1804 - 1818
[50] Enabling modality interactions for RGB-T salient object detection
Zhang, Qiang
Xi, Ruida
Xiao, Tonglin
Huang, Nianchang
Luo, Yongjiang
COMPUTER VISION AND IMAGE UNDERSTANDING, 2022, 222

← 1 2 3 4 5 →