Lightweight Multi-modal Representation Learning for RGB Salient Object Detection

被引:1
|
作者
Xiao, Yun [1 ,2 ,4 ]
Huang, Yameng [3 ]
Li, Chenglong [1 ,2 ,4 ]
Liu, Lei [3 ]
Zhou, Aiwu [3 ]
Tang, Jin [3 ]
机构
[1] Informat Mat & Intelligent Sensing Lab Anhui Prov, Hefei 230601, Peoples R China
[2] Anhui Prov Key Lab Multimodal Cognit Computat, Hefei 230601, Peoples R China
[3] Anhui Univ, Sch Comp Sci & Technol, Hefei 230601, Peoples R China
[4] Anhui Univ, Sch Artificial Intelligence, Hefei 230601, Peoples R China
基金
中国国家自然科学基金;
关键词
Salient object detection; Depth estimation; Lightweight network; Multi-modal representation learning; NETWORK;
D O I
10.1007/s12559-023-10148-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The task of salient object detection (SOD) often faces various challenges such as complex backgrounds and low appearance contrast. Depth information, which reflects the geometric shape of an object's surface, can be used as a supplement to visible information and receives increasing interest in SOD. However, depth sensors suffer from limited conditions and range (e.g., 4-5 ms at most in indoor scenes), and the imaging quality is usually low. We design a lightweight network in order to infer depth features while reducing computational complexities, which only needs a few parameters to effectively capture depth-specific features by fusing high-level features from the RGB modality. Both RGB features and inferred depth features might contain noises, and thus we design a fusion network, which includes a self-attention-based feature interaction module and a foreground-background enhancement module, to achieve an adaptive fusion of RGB and depth features. In addition, we introduce a multi-scale fusion module with different dilated convolutions to leverage useful local and global context clues. Experimental results on five benchmark datasets show that our approach significantly outperforms the state-of-the-art RGBD SOD methods, and also performs comparably against the state-of-the-art RGB SOD methods. The experimental results show that our multi-modal representation learning method can deal with the imaging limitations of single-modality data for RGB salient object detection, and the experimental results on multiple RGBD and RGB SOD datasets illustrate the effectiveness of our method.
引用
收藏
页码:1868 / 1883
页数:16
相关论文
共 50 条
  • [21] Imagery in multi-modal object learning
    Jüttner, M
    Rentschler, I
    BEHAVIORAL AND BRAIN SCIENCES, 2002, 25 (02) : 197 - +
  • [22] Lightweight Cross-Modal Information Mutual Reinforcement Network for RGB-T Salient Object Detection
    Lv, Chengtao
    Wan, Bin
    Zhou, Xiaofei
    Sun, Yaoqi
    Zhang, Jiyong
    Yan, Chenggang
    ENTROPY, 2024, 26 (02)
  • [23] Multi-modal Network Representation Learning
    Zhang, Chuxu
    Jiang, Meng
    Zhang, Xiangliang
    Ye, Yanfang
    Chawla, Nitesh, V
    KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 3557 - 3558
  • [24] MEANet: Multi-modal edge-aware network for light field salient object detection
    Jiang, Yao
    Zhang, Wenbo
    Fu, Keren
    Zhao, Qijun
    NEUROCOMPUTING, 2022, 491 : 78 - 90
  • [25] MMSS: Multi-modal Sharable and Specific Feature Learning for RGB-D Object Recognition
    Wang, Anran
    Cai, Jianfei
    Lu, Jiwen
    Cham, Tat-Jen
    2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 1125 - 1133
  • [26] Large-Margin Multi-Modal Deep Learning for RGB-D Object Recognition
    Wang, Anran
    Lu, Jiwen
    Cai, Jianfei
    Cham, Tat-Jen
    Wang, Gang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2015, 17 (11) : 1887 - 1898
  • [27] Modal complementary fusion network for RGB-T salient object detection
    Shuai Ma
    Kechen Song
    Hongwen Dong
    Hongkun Tian
    Yunhui Yan
    Applied Intelligence, 2023, 53 : 9038 - 9055
  • [28] Modal complementary fusion network for RGB-T salient object detection
    Ma, Shuai
    Song, Kechen
    Dong, Hongwen
    Tian, Hongkun
    Yan, Yunhui
    APPLIED INTELLIGENCE, 2023, 53 (08) : 9038 - 9055
  • [29] Multi-modal Queried Object Detection in the Wild
    Xu, Yifan
    Zhang, Mengdan
    Fu, Chaoyou
    Chen, Peixian
    Yang, Xiaoshan
    Li, Ke
    Xu, Changsheng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [30] Mineral: Multi-modal Network Representation Learning
    Kefato, Zekarias T.
    Sheikh, Nasrullah
    Montresor, Alberto
    MACHINE LEARNING, OPTIMIZATION, AND BIG DATA, MOD 2017, 2018, 10710 : 286 - 298