MWVOS: Mask-Free Weakly Supervised Video Object Segmentation via promptable foundation model

被引:0
|
作者
Zhang, Zhenghao [1 ]
Zhang, Shengfan [1 ]
Dai, Zuozhuo [1 ]
Dong, Zilong [1 ]
Zhu, Siyu [2 ]
机构
[1] Alibaba Grp, Hangzhou 310030, Peoples R China
[2] Fudan Univ, Shanghai 200433, Peoples R China
关键词
Vision foundation model; Video instance segmentation; Deep learning;
D O I
10.1016/j.patcog.2024.111100
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The current state-of-the-art techniques for video object segmentation necessitate extensive training on video datasets with mask annotations, thereby constraining their ability to transfer zero-shot learning to new image distributions and tasks. However, recent advancements in foundation models, particularly in the domain of image segmentation, have showcased robust generalization capabilities, introducing a novel prompt-driven paradigm fora variety of downstream segmentation challenges on new data distributions. This study delves into the potential of vision foundation models using diverse prompt strategies and proposes a mask-free approach for unsupervised video object segmentation. To further improve the efficacy of prompt learning in diverse and complex video scenes, we introduce a spatial-temporal decoupled deformable attention mechanism to establish an effective correlation between intra- and inter-frame features. Extensive experiments conducted on the DAVIS2017-unsupervised and YoutubeVIS19&21 and OIVS datasets demonstrate the superior performance of the proposed approach without mask supervision when compared to existing mask-supervised methods, as well as its capacity to generalize to weakly-annotated video datasets.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] Video Object Segmentation via Adaptive Threshold based on Background Model Diversity
    Bachir, Boubekeur Mohamed
    SenLin, Luo
    Hocine, Labidi
    Tarek, Benlefki
    SIXTH INTERNATIONAL CONFERENCE ON GRAPHIC AND IMAGE PROCESSING (ICGIP 2014), 2015, 9443
  • [42] MUTUALLY SUPERVISED LEARNING VIA INTERACTIVE CONSISTENCY FOR GEOGRAPHIC OBJECT SEGMENTATION FROM WEAKLY LABELED REMOTE SENSING IMAGERY
    Liu, Yanan
    Zhang, Libao
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 2985 - 2989
  • [43] Weakly Supervised Object Co-Localization via Sharing Parts Based on a Joint Bayesian Model
    Wu, Lu
    Liu, Quan
    SYMMETRY-BASEL, 2018, 10 (05):
  • [44] Weakly supervised segmentation with point annotations for histopathology images via contrast-based variational model
    Zhang, Hongrun
    Burrows, Liam
    Meng, Yanda
    Sculthorpe, Declan
    Mukherjee, Abhik
    Coupland, Sarah E.
    Chen, Ke
    Zheng, Yalin
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 15630 - 15640
  • [45] MARS: Model-agnostic Biased Object Removal without Additional Supervision for Weakly-Supervised Semantic Segmentation
    Jo, Sanghyun
    Yu, In-Jae
    Kim, Kyungsu
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 614 - 623
  • [46] Accurate Object Segmentation for Video Sequences via Temporal-Spatial-Frequency Saliency Model
    Xu, Bing
    Niu, Yanxiong
    IEEE INTELLIGENT SYSTEMS, 2018, 33 (01) : 18 - 28
  • [47] Foundation Model for Endoscopy Video Analysis via Large-Scale Self-supervised Pre-train
    Wang, Zhao
    Liu, Chang
    Zhang, Shaoting
    Dou, Qi
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023, PT IX, 2023, 14228 : 101 - 111
  • [48] From ViT Features to Training-free Video Object Segmentation via Streaming-data Mixture Models
    Uziel, Roy
    Dinari, Or
    Freifeld, Oren
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [49] CROSS SCENE VIDEO FOREGROUND SEGMENTATION VIA CO-OCCURRENCE PROBABILITY ORIENTED SUPERVISED AND UNSUPERVISED MODEL INTERACTION
    Liang, Dong
    Kang, Bin
    Liu, Xinyu
    Sun, Han
    Zhang, Liyan
    Liu, Ningzhong
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 1795 - 1799
  • [50] Seeing Through the Occluders: Robust Monocular 6-DOF Object Pose Tracking via Model-Guided Video Object Segmentation
    Zhong, Leisheng
    Zhang, Yu
    Zhao, Hao
    Chang, An
    Xiang, Wenhao
    Zhang, Shunli
    Zhang, Li
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2020, 5 (04): : 5159 - 5166