Weakly-Supervised RGBD Video Object Segmentation

被引:0
|
作者
Yang, Jinyu [1 ,2 ]
Gao, Mingqi [1 ,3 ]
Zheng, Feng [4 ]
Zhen, Xiantong [5 ]
Ji, Rongrong [6 ]
Shao, Ling [7 ]
Leonardis, Ales [8 ]
机构
[1] Southern Univ Sci & Technol, Dept Comp Sci & Engn, Shenzhen 518055, Peoples R China
[2] Univ Birmingham, Birmingham B15 2TT, England
[3] Univ Warwick, Coventry CV4 7AL, England
[4] Southern Univ Sci & Technol, Shenzhen 518055, Peoples R China
[5] Guangdong Univ Petrochem Technol, Coll Comp Sci, Maoming 525011, Peoples R China
[6] Xiamen Univ, Sch Informat, Dept Artificial Intelligence, Media Analyt & Comp Lab, Xiamen 361005, Peoples R China
[7] Univ Chinese Acad Sci, UCAS Terminus AI Lab, Beijing 101408, Peoples R China
[8] Univ Birmingham, Sch Comp Sci, Birmingham B15 2TT, England
基金
中国国家自然科学基金;
关键词
Annotations; Object segmentation; Training; Target tracking; Task analysis; Object tracking; Benchmark testing; RGBD data; video object segmentation; visual tracking; TRACKING;
D O I
10.1109/TIP.2024.3374130
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Depth information opens up new opportunities for video object segmentation (VOS) to be more accurate and robust in complex scenes. However, the RGBD VOS task is largely unexplored due to the expensive collection of RGBD data and time-consuming annotation of segmentation. In this work, we first introduce a new benchmark for RGBD VOS, named DepthVOS, which contains 350 videos (over 55k frames in total) annotated with masks and bounding boxes. We futher propose a novel, strong baseline model - Fused Color-Depth Network (FusedCDNet), which can be trained solely under the supervision of bounding boxes, while being used to generate masks with a bounding box guideline only in the first frame. Thereby, the model possesses three major advantages: a weakly-supervised training strategy to overcome the high-cost annotation, a cross-modal fusion module to handle complex scenes, and weakly-supervised inference to promote ease of use. Extensive experiments demonstrate that our proposed method performs on par with top fully-supervised algorithms. We will open-source our project on https://github.com/yjybuaa/depthvos/ to facilitate the development of RGBD VOS.
引用
收藏
页码:2158 / 2170
页数:13
相关论文
共 50 条
  • [1] Bilateral Temporal Re-Aggregation for Weakly-Supervised Video Object Segmentation
    Lin, Fanchao
    Xie, Hongtao
    Liu, Chuanbin
    Zhang, Yongdong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (07) : 4498 - 4512
  • [2] Weakly Supervised Video Object Segmentation
    Wang, Yufei
    Hu, Yongjiang
    Liew, Alan Wee-Chung
    Wang, Junhu
    PROCEEDINGS OF TENCON 2018 - 2018 IEEE REGION 10 CONFERENCE, 2018, : 0315 - 0320
  • [3] Weakly-Supervised Ultrasound Video Segmentation with Minimal Annotations
    Chang, Ruiheng
    Wang, Dong
    Guo, Haiyan
    Ding, Jia
    Wang, Liwei
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT VIII, 2021, 12908 : 648 - 658
  • [4] Query-Memory Re-Aggregation for Weakly-supervised Video Object Segmentation
    Lin, Fanchao
    Xie, Hongtao
    Li, Yan
    Zhang, Yongdong
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 2038 - 2046
  • [5] Efficient Object Region Discovery for Weakly-supervised Semantic Segmentation
    Zhong, Min
    Zeng, Gang
    2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 2166 - 2171
  • [6] WeClick: Weakly-Supervised Video Semantic Segmentation with Click Annotations
    Liu, Peidong
    He, Zibin
    Yan, Xiyu
    Jiang, Yong
    Xia, Shu-Tao
    Zheng, Feng
    Hu, Maowei
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 2995 - 3004
  • [7] Weakly-Supervised Video Object Grounding via Causal Intervention
    Wang, Wei
    Gao, Junyu
    Xu, Changsheng
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (03) : 3933 - 3948
  • [8] Weakly-Supervised Semantic Segmentation by Iteratively Mining Common Object Features
    Wang, Xiang
    You, Shaodi
    Li, Xi
    Ma, Huimin
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 1354 - 1362
  • [9] Weakly-Supervised Video Object Grounding via Stable Context Learning
    Wang, Wei
    Gao, Junyu
    Xu, Changsheng
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 760 - 768
  • [10] Weakly-Supervised Text Instance Segmentation
    Zu, Xinyan
    Yu, Haiyang
    Li, Bin
    Xue, Xiangyang
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 1915 - 1923