Lightweight Multi-modal Representation Learning for RGB Salient Object Detection

被引：1

作者：

Xiao, Yun ^{[1
,2
,4
]}

Huang, Yameng ^{[3
]}

Li, Chenglong ^{[1
,2
,4
]}

Liu, Lei ^{[3
]}

Zhou, Aiwu ^{[3
]}

Tang, Jin ^{[3
]}

机构：

[1] Informat Mat & Intelligent Sensing Lab Anhui Prov, Hefei 230601, Peoples R China

[2] Anhui Prov Key Lab Multimodal Cognit Computat, Hefei 230601, Peoples R China

[3] Anhui Univ, Sch Comp Sci & Technol, Hefei 230601, Peoples R China

[4] Anhui Univ, Sch Artificial Intelligence, Hefei 230601, Peoples R China

来源：

COGNITIVE COMPUTATION | 2023年 / 15卷 / 06期

基金：

中国国家自然科学基金;

关键词：

Salient object detection; Depth estimation; Lightweight network; Multi-modal representation learning; NETWORK;

D O I：

10.1007/s12559-023-10148-1

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The task of salient object detection (SOD) often faces various challenges such as complex backgrounds and low appearance contrast. Depth information, which reflects the geometric shape of an object's surface, can be used as a supplement to visible information and receives increasing interest in SOD. However, depth sensors suffer from limited conditions and range (e.g., 4-5 ms at most in indoor scenes), and the imaging quality is usually low. We design a lightweight network in order to infer depth features while reducing computational complexities, which only needs a few parameters to effectively capture depth-specific features by fusing high-level features from the RGB modality. Both RGB features and inferred depth features might contain noises, and thus we design a fusion network, which includes a self-attention-based feature interaction module and a foreground-background enhancement module, to achieve an adaptive fusion of RGB and depth features. In addition, we introduce a multi-scale fusion module with different dilated convolutions to leverage useful local and global context clues. Experimental results on five benchmark datasets show that our approach significantly outperforms the state-of-the-art RGBD SOD methods, and also performs comparably against the state-of-the-art RGB SOD methods. The experimental results show that our multi-modal representation learning method can deal with the imaging limitations of single-modality data for RGB salient object detection, and the experimental results on multiple RGBD and RGB SOD datasets illustrate the effectiveness of our method.

引用

页码：1868 / 1883

页数：16

共 50 条

[21] Imagery in multi-modal object learning
Jüttner, M
Rentschler, I
BEHAVIORAL AND BRAIN SCIENCES, 2002, 25 (02) : 197 - +
[22] Lightweight Cross-Modal Information Mutual Reinforcement Network for RGB-T Salient Object Detection
Lv, Chengtao
Wan, Bin
Zhou, Xiaofei
Sun, Yaoqi
Zhang, Jiyong
Yan, Chenggang
ENTROPY, 2024, 26 (02)
[23] Multi-modal Network Representation Learning
Zhang, Chuxu
Jiang, Meng
Zhang, Xiangliang
Ye, Yanfang
Chawla, Nitesh, V
KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 3557 - 3558
[24] MEANet: Multi-modal edge-aware network for light field salient object detection
Jiang, Yao
Zhang, Wenbo
Fu, Keren
Zhao, Qijun
NEUROCOMPUTING, 2022, 491 : 78 - 90
[25] MMSS: Multi-modal Sharable and Specific Feature Learning for RGB-D Object Recognition
Wang, Anran
Cai, Jianfei
Lu, Jiwen
Cham, Tat-Jen
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 1125 - 1133
[26] Large-Margin Multi-Modal Deep Learning for RGB-D Object Recognition
Wang, Anran
Lu, Jiwen
Cai, Jianfei
Cham, Tat-Jen
Wang, Gang
IEEE TRANSACTIONS ON MULTIMEDIA, 2015, 17 (11) : 1887 - 1898
[27] Modal complementary fusion network for RGB-T salient object detection
Shuai Ma
Kechen Song
Hongwen Dong
Hongkun Tian
Yunhui Yan
Applied Intelligence, 2023, 53 : 9038 - 9055
[28] Modal complementary fusion network for RGB-T salient object detection
Ma, Shuai
Song, Kechen
Dong, Hongwen
Tian, Hongkun
Yan, Yunhui
APPLIED INTELLIGENCE, 2023, 53 (08) : 9038 - 9055
[29] Multi-modal Queried Object Detection in the Wild
Xu, Yifan
Zhang, Mengdan
Fu, Chaoyou
Chen, Peixian
Yang, Xiaoshan
Li, Ke
Xu, Changsheng
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[30] Mineral: Multi-modal Network Representation Learning
Kefato, Zekarias T.
Sheikh, Nasrullah
Montresor, Alberto
MACHINE LEARNING, OPTIMIZATION, AND BIG DATA, MOD 2017, 2018, 10710 : 286 - 298

← 1 2 3 4 5 →