Salient Object Detection in RGB-D Videos

被引:0
|
作者
Mou, Ao [1 ]
Lu, Yukang [1 ]
He, Jiahao [1 ]
Min, Dingyao [1 ]
Fu, Keren [1 ]
Zhao, Qijun [1 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Peoples R China
关键词
Salient object detection; RGB-D videos; depth; optical flow; multi-modal fusion; NETWORK; OPTIMIZATION; FUSION;
D O I
10.1109/TIP.2024.3498326
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Given the widespread adoption of depth-sensing acquisition devices, RGB-D videos and related data/media have gained considerable traction in various aspects of daily life. Consequently, conducting salient object detection (SOD) in RGB-D videos presents a highly promising and evolving avenue. Despite the potential of this area, SOD in RGB-D videos remains somewhat under-explored, with RGB-D SOD and video SOD (VSOD) traditionally studied in isolation. To explore this emerging field, this paper makes two primary contributions: the dataset and the model. On one front, we construct the RDVS dataset, a new RGB-D VSOD dataset with realistic depth and characterized by its diversity of scenes and rigorous frame-by-frame annotations. We validate the dataset through comprehensive attribute and object-oriented analyses, and provide training and testing splits. Moreover, we introduce DCTNet+, a three-stream network tailored for RGB-D VSOD, with an emphasis on RGB modality and treats depth and optical flow as auxiliary modalities. In pursuit of effective feature enhancement, refinement, and fusion for precise final prediction, we propose two modules: the multi-modal attention module (MAM) and the refinement fusion module (RFM). To enhance interaction and fusion within RFM, we design a universal interaction module (UIM) and then integrate holistic multi-modal attentive paths (HMAPs) for refining multi-modal low-level features before reaching RFMs. Comprehensive experiments, conducted on pseudo RGB-D video datasets alongside our proposed RDVS, highlight the superiority of DCTNet+ over 19 VSOD models and 14 RGB-D SOD models. Additionally, insightful ablation experiments were performed on both pseudo and realistic RGB-D video datasets to demonstrate the advantages of individual modules as well as the necessity of introducing realistic depth into VSOD. Our code together with RDVS dataset will be available at https://github.com/kerenfu/RDVS/.
引用
收藏
页码:6660 / 6675
页数:16
相关论文
共 50 条
  • [41] SALIENT OBJECT DETECTION FOR RGB-D IMAGE VIA SALIENCY EVOLUTION
    Guo, Jingfan
    Ren, Tongwei
    Bei, Jia
    2016 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO (ICME), 2016,
  • [42] MULTI-MODAL TRANSFORMER FOR RGB-D SALIENT OBJECT DETECTION
    Song, Peipei
    Zhang, Jing
    Koniusz, Piotr
    Barnes, Nick
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 2466 - 2470
  • [43] RGB-D Fusion Based on Fuzzy Optimization for Salient Object Detection
    Bhuyan, Sudipta
    Sen, Debashis
    Deb, Sankha
    PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PREMI 2023, 2023, 14301 : 523 - 531
  • [44] Context-aware network for RGB-D salient object detection
    Liang, Fangfang
    Duan, Lijuan
    Ma, Wei
    Qiao, Yuanhua
    Miao, Jun
    Ye, Qixiang
    PATTERN RECOGNITION, 2021, 111
  • [45] CDNet: Complementary Depth Network for RGB-D Salient Object Detection
    Jin, Wen-Da
    Xu, Jun
    Han, Qi
    Zhang, Yi
    Cheng, Ming-Ming
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 3376 - 3390
  • [46] Scale Adaptive Fusion Network for RGB-D Salient Object Detection
    Kong, Yuqiu
    Zheng, Yushuo
    Yao, Cuili
    Liu, Yang
    Wang, He
    COMPUTER VISION - ACCV 2022, PT III, 2023, 13843 : 608 - 625
  • [47] RGB-D Salient Object Detection by a CNN With Multiple Layers Fusion
    Huang, Rui
    Xing, Yan
    Wang, ZeZheng
    IEEE SIGNAL PROCESSING LETTERS, 2019, 26 (04) : 552 - 556
  • [48] Salient object detection for RGB-D images by generative adversarial network
    Liu, Zhengyi
    Tang, Jiting
    Xiang, Qian
    Zhao, Peng
    MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (35-36) : 25403 - 25425
  • [49] CoCNN: RGB-D deep fusion for stereoscopic salient object detection
    Liang, Fangfang
    Duan, Lijuan
    Ma, Wei
    Qiao, Yuanhua
    Cai, Zhi
    Miao, Jun
    Ye, Qixiang
    PATTERN RECOGNITION, 2020, 104 (104)
  • [50] An adaptive guidance fusion network for RGB-D salient object detection
    Haodong Sun
    Yu Wang
    Xinpeng Ma
    Signal, Image and Video Processing, 2024, 18 : 1683 - 1693