Lightweight Multi-modal Representation Learning for RGB Salient Object Detection

被引:1
|
作者
Xiao, Yun [1 ,2 ,4 ]
Huang, Yameng [3 ]
Li, Chenglong [1 ,2 ,4 ]
Liu, Lei [3 ]
Zhou, Aiwu [3 ]
Tang, Jin [3 ]
机构
[1] Informat Mat & Intelligent Sensing Lab Anhui Prov, Hefei 230601, Peoples R China
[2] Anhui Prov Key Lab Multimodal Cognit Computat, Hefei 230601, Peoples R China
[3] Anhui Univ, Sch Comp Sci & Technol, Hefei 230601, Peoples R China
[4] Anhui Univ, Sch Artificial Intelligence, Hefei 230601, Peoples R China
基金
中国国家自然科学基金;
关键词
Salient object detection; Depth estimation; Lightweight network; Multi-modal representation learning; NETWORK;
D O I
10.1007/s12559-023-10148-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The task of salient object detection (SOD) often faces various challenges such as complex backgrounds and low appearance contrast. Depth information, which reflects the geometric shape of an object's surface, can be used as a supplement to visible information and receives increasing interest in SOD. However, depth sensors suffer from limited conditions and range (e.g., 4-5 ms at most in indoor scenes), and the imaging quality is usually low. We design a lightweight network in order to infer depth features while reducing computational complexities, which only needs a few parameters to effectively capture depth-specific features by fusing high-level features from the RGB modality. Both RGB features and inferred depth features might contain noises, and thus we design a fusion network, which includes a self-attention-based feature interaction module and a foreground-background enhancement module, to achieve an adaptive fusion of RGB and depth features. In addition, we introduce a multi-scale fusion module with different dilated convolutions to leverage useful local and global context clues. Experimental results on five benchmark datasets show that our approach significantly outperforms the state-of-the-art RGBD SOD methods, and also performs comparably against the state-of-the-art RGB SOD methods. The experimental results show that our multi-modal representation learning method can deal with the imaging limitations of single-modality data for RGB salient object detection, and the experimental results on multiple RGBD and RGB SOD datasets illustrate the effectiveness of our method.
引用
收藏
页码:1868 / 1883
页数:16
相关论文
共 50 条
  • [1] Lightweight Multi-modal Representation Learning for RGB Salient Object Detection
    Yun Xiao
    Yameng Huang
    Chenglong Li
    Lei Liu
    Aiwu Zhou
    Jin Tang
    Cognitive Computation, 2023, 15 : 1868 - 1883
  • [2] MULTI-MODAL TRANSFORMER FOR RGB-D SALIENT OBJECT DETECTION
    Song, Peipei
    Zhang, Jing
    Koniusz, Piotr
    Barnes, Nick
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 2466 - 2470
  • [3] RGB-D Salient Object Detection Based on Multi-Modal Feature Interaction
    Gao, Yue
    Dai, Meng
    Zhang, Qing
    Computer Engineering and Applications, 2024, 60 (02) : 211 - 220
  • [4] Learning Adaptive Fusion Bank for Multi-Modal Salient Object Detection
    Wang, Kunpeng
    Tu, Zhengzheng
    Li, Chenglong
    Zhang, Cheng
    Luo, Bin
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (08) : 7344 - 7358
  • [5] BMFNet: Bifurcated multi-modal fusion network for RGB-D salient object detection
    Sun, Chenwang
    Zhang, Qing
    Zhuang, Chenyu
    Zhang, Mingqian
    IMAGE AND VISION COMPUTING, 2024, 147
  • [6] Multi-modal deep feature learning for RGB-D object detection
    Xu, Xiangyang
    Li, Yuncheng
    Wu, Gangshan
    Luo, Jiebo
    PATTERN RECOGNITION, 2017, 72 : 300 - 313
  • [7] Unified Information Fusion Network for Multi-Modal RGB-D and RGB-T Salient Object Detection
    Gao, Wei
    Liao, Guibiao
    Ma, Siwei
    Li, Ge
    Liang, Yongsheng
    Lin, Weisi
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (04) : 2091 - 2106
  • [8] RGB-D Salient Object Detection Method Based on Multi-Modal Fusion and Contour Guidance
    Peng, Yanbin
    Feng, Mingkun
    Zheng, Zhijun
    IEEE ACCESS, 2023, 11 : 145217 - 145230
  • [9] Lightweight cross-modal transformer for RGB-D salient object detection
    Huang, Nianchang
    Yang, Yang
    Zhang, Qiang
    Han, Jungong
    Huang, Jin
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 249
  • [10] Progressive Guided Fusion Network With Multi-Modal and Multi-Scale Attention for RGB-D Salient Object Detection
    Wu, Jiajia
    Han, Guangliang
    Wang, Haining
    Yang, Hang
    Li, Qingqing
    Liu, Dongxu
    Ye, Fangjian
    Liu, Peixun
    IEEE ACCESS, 2021, 9 : 150608 - 150622