Weakly-Supervised RGBD Video Object Segmentation

被引:0
|
作者
Yang, Jinyu [1 ,2 ]
Gao, Mingqi [1 ,3 ]
Zheng, Feng [4 ]
Zhen, Xiantong [5 ]
Ji, Rongrong [6 ]
Shao, Ling [7 ]
Leonardis, Ales [8 ]
机构
[1] Southern Univ Sci & Technol, Dept Comp Sci & Engn, Shenzhen 518055, Peoples R China
[2] Univ Birmingham, Birmingham B15 2TT, England
[3] Univ Warwick, Coventry CV4 7AL, England
[4] Southern Univ Sci & Technol, Shenzhen 518055, Peoples R China
[5] Guangdong Univ Petrochem Technol, Coll Comp Sci, Maoming 525011, Peoples R China
[6] Xiamen Univ, Sch Informat, Dept Artificial Intelligence, Media Analyt & Comp Lab, Xiamen 361005, Peoples R China
[7] Univ Chinese Acad Sci, UCAS Terminus AI Lab, Beijing 101408, Peoples R China
[8] Univ Birmingham, Sch Comp Sci, Birmingham B15 2TT, England
基金
中国国家自然科学基金;
关键词
Annotations; Object segmentation; Training; Target tracking; Task analysis; Object tracking; Benchmark testing; RGBD data; video object segmentation; visual tracking; TRACKING;
D O I
10.1109/TIP.2024.3374130
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Depth information opens up new opportunities for video object segmentation (VOS) to be more accurate and robust in complex scenes. However, the RGBD VOS task is largely unexplored due to the expensive collection of RGBD data and time-consuming annotation of segmentation. In this work, we first introduce a new benchmark for RGBD VOS, named DepthVOS, which contains 350 videos (over 55k frames in total) annotated with masks and bounding boxes. We futher propose a novel, strong baseline model - Fused Color-Depth Network (FusedCDNet), which can be trained solely under the supervision of bounding boxes, while being used to generate masks with a bounding box guideline only in the first frame. Thereby, the model possesses three major advantages: a weakly-supervised training strategy to overcome the high-cost annotation, a cross-modal fusion module to handle complex scenes, and weakly-supervised inference to promote ease of use. Extensive experiments demonstrate that our proposed method performs on par with top fully-supervised algorithms. We will open-source our project on https://github.com/yjybuaa/depthvos/ to facilitate the development of RGBD VOS.
引用
收藏
页码:2158 / 2170
页数:13
相关论文
共 50 条
  • [21] Token Contrast for Weakly-Supervised Semantic Segmentation
    Ru, Lixiang
    Zheng, Hehang
    Zhan, Yibing
    Du, Bo
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 3093 - 3102
  • [22] Rethinking CAM in Weakly-Supervised Semantic Segmentation
    Song, Yuqi
    Li, Xiaojie
    Shi, Canghong
    Feng, Shihao
    Wang, Xin
    Luo, Yong
    Xi, Wu
    IEEE ACCESS, 2022, 10 : 126440 - 126450
  • [23] WEAKLY-SUPERVISED PLATE AND FOOD REGION SEGMENTATION
    Shimoda, Wataru
    Yanai, Keiji
    2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2020,
  • [24] Adversarial Learning of Object-Aware Activation Map for Weakly-Supervised Semantic Segmentation
    Chen, Junliang
    Lu, Weizeng
    Li, Yuexiang
    Shen, Linlin
    Duan, Jinming
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (08) : 3935 - 3946
  • [25] Predicting Segmentation "Easiness" from the Consistency for Weakly-Supervised Segmentation
    Shimoda, Wataru
    Yanai, Keiji
    PROCEEDINGS 2017 4TH IAPR ASIAN CONFERENCE ON PATTERN RECOGNITION (ACPR), 2017, : 292 - 297
  • [26] Weakly-Supervised Video Object Grounding via Learning Uni-Modal Associations
    Wang, Wei
    Gao, Junyu
    Xu, Changsheng
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 6329 - 6340
  • [27] A Weakly-Supervised Cross-Domain Query Framework for Video Camouflage Object Detection
    Lu, Zelin
    Xie, Liang
    Zhao, Xing
    Xu, Binwei
    Liang, Haoran
    Liang, Ronghua
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (02) : 1506 - 1518
  • [28] Sequential Clique Optimization for Unsupervised and Weakly Supervised Video Object Segmentation
    Koh, Yeong Jun
    Heo, Yuk
    Kim, Chang-Su
    ELECTRONICS, 2022, 11 (18)
  • [29] Weakly-Supervised Dual Clustering for Image Semantic Segmentation
    Liu, Yang
    Liu, Jing
    Li, Zechao
    Tang, Jinhui
    Lu, Hanqing
    2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, : 2075 - 2082
  • [30] Discriminative Region Suppression for Weakly-Supervised Semantic Segmentation
    Kim, Beomyoung
    Han, Sangeun
    Kim, Junmo
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 1754 - 1761