Salient Object Detection in RGB-D Videos

被引：0

作者：

Mou, Ao ^{[1
]}

Lu, Yukang ^{[1
]}

He, Jiahao ^{[1
]}

Min, Dingyao ^{[1
]}

Fu, Keren ^{[1
]}

Zhao, Qijun ^{[1
]}

机构：

[1] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Peoples R China

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2024年 / 33卷

关键词：

Salient object detection; RGB-D videos; depth; optical flow; multi-modal fusion; NETWORK; OPTIMIZATION; FUSION;

D O I：

10.1109/TIP.2024.3498326

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Given the widespread adoption of depth-sensing acquisition devices, RGB-D videos and related data/media have gained considerable traction in various aspects of daily life. Consequently, conducting salient object detection (SOD) in RGB-D videos presents a highly promising and evolving avenue. Despite the potential of this area, SOD in RGB-D videos remains somewhat under-explored, with RGB-D SOD and video SOD (VSOD) traditionally studied in isolation. To explore this emerging field, this paper makes two primary contributions: the dataset and the model. On one front, we construct the RDVS dataset, a new RGB-D VSOD dataset with realistic depth and characterized by its diversity of scenes and rigorous frame-by-frame annotations. We validate the dataset through comprehensive attribute and object-oriented analyses, and provide training and testing splits. Moreover, we introduce DCTNet+, a three-stream network tailored for RGB-D VSOD, with an emphasis on RGB modality and treats depth and optical flow as auxiliary modalities. In pursuit of effective feature enhancement, refinement, and fusion for precise final prediction, we propose two modules: the multi-modal attention module (MAM) and the refinement fusion module (RFM). To enhance interaction and fusion within RFM, we design a universal interaction module (UIM) and then integrate holistic multi-modal attentive paths (HMAPs) for refining multi-modal low-level features before reaching RFMs. Comprehensive experiments, conducted on pseudo RGB-D video datasets alongside our proposed RDVS, highlight the superiority of DCTNet+ over 19 VSOD models and 14 RGB-D SOD models. Additionally, insightful ablation experiments were performed on both pseudo and realistic RGB-D video datasets to demonstrate the advantages of individual modules as well as the necessity of introducing realistic depth into VSOD. Our code together with RDVS dataset will be available at https://github.com/kerenfu/RDVS/.

引用

页码：6660 / 6675

页数：16

共 50 条

[1] RGB-D salient object detection: A survey
Tao Zhou
Deng-Ping Fan
Ming-Ming Cheng
Jianbing Shen
Ling Shao
ComputationalVisualMedia, 2021, 7 (01) : 37 - 69
[2] RGB-D salient object detection: A survey
Zhou, Tao
Fan, Deng-Ping
Cheng, Ming-Ming
Shen, Jianbing
Shao, Ling
COMPUTATIONAL VISUAL MEDIA, 2021, 7 (01) : 37 - 69
[3] RGB-D salient object detection: A survey
Tao Zhou
Deng-Ping Fan
Ming-Ming Cheng
Jianbing Shen
Ling Shao
Computational Visual Media, 2021, 7 : 37 - 69
[4] Calibrated RGB-D Salient Object Detection
Ji, Wei
Li, Jingjing
Yu, Shuang
Zhang, Miao
Piao, Yongri
Yao, Shunyu
Bi, Qi
Ma, Kai
Zheng, Yefeng
Lu, Huchuan
Cheng, Li
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 9466 - 9476
[5] DVSOD: RGB-D Video Salient Object Detection
Li, Jingjing
Ji, Wei
Wang, Size
Li, Wenbo
Cheng, Li
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[6] Advancing in RGB-D Salient Object Detection: A Survey
Chen, Ai
Li, Xin
He, Tianxiang
Zhou, Junlin
Chen, Duanbing
APPLIED SCIENCES-BASEL, 2024, 14 (17):
[7] Adaptive Fusion for RGB-D Salient Object Detection
Wang, Ningning
Gong, Xiaojin
IEEE ACCESS, 2019, 7 : 55277 - 55284
[8] AirSOD: A Lightweight Network for RGB-D Salient Object Detection
Zeng, Zhihong
Liu, Haijun
Chen, Fenglei
Tan, Xiaoheng
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (03) : 1656 - 1669
[9] Aggregate interactive learning for RGB-D salient object detection
Wu, Jingyu
Sun, Fuming
Xu, Rui
Meng, Jie
Wang, Fasheng
EXPERT SYSTEMS WITH APPLICATIONS, 2022, 195
[10] Local Background Enclosure for RGB-D Salient Object Detection
Feng, David
Barnes, Nick
You, Shaodi
McCarthy, Chris
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 2343 - 2350

← 1 2 3 4 5 →