MWVOS: Mask-Free Weakly Supervised Video Object Segmentation via promptable foundation model

被引:0
|
作者
Zhang, Zhenghao [1 ]
Zhang, Shengfan [1 ]
Dai, Zuozhuo [1 ]
Dong, Zilong [1 ]
Zhu, Siyu [2 ]
机构
[1] Alibaba Grp, Hangzhou 310030, Peoples R China
[2] Fudan Univ, Shanghai 200433, Peoples R China
关键词
Vision foundation model; Video instance segmentation; Deep learning;
D O I
10.1016/j.patcog.2024.111100
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The current state-of-the-art techniques for video object segmentation necessitate extensive training on video datasets with mask annotations, thereby constraining their ability to transfer zero-shot learning to new image distributions and tasks. However, recent advancements in foundation models, particularly in the domain of image segmentation, have showcased robust generalization capabilities, introducing a novel prompt-driven paradigm fora variety of downstream segmentation challenges on new data distributions. This study delves into the potential of vision foundation models using diverse prompt strategies and proposes a mask-free approach for unsupervised video object segmentation. To further improve the efficacy of prompt learning in diverse and complex video scenes, we introduce a spatial-temporal decoupled deformable attention mechanism to establish an effective correlation between intra- and inter-frame features. Extensive experiments conducted on the DAVIS2017-unsupervised and YoutubeVIS19&21 and OIVS datasets demonstrate the superior performance of the proposed approach without mask supervision when compared to existing mask-supervised methods, as well as its capacity to generalize to weakly-annotated video datasets.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Mask-Free Video Instance Segmentation
    Ke, Lei
    Danelljan, Martin
    Ding, Henghui
    Tai, Yu-Wing
    Tang, Chi-Keung
    Yu, Fisher
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 22857 - 22866
  • [2] Weakly Supervised Video Object Segmentation
    Wang, Yufei
    Hu, Yongjiang
    Liew, Alan Wee-Chung
    Wang, Junhu
    PROCEEDINGS OF TENCON 2018 - 2018 IEEE REGION 10 CONFERENCE, 2018, : 0315 - 0320
  • [3] Mask-free Iterative Refinement Network for weakly-supervised Few-shot Semantic Segmentation
    Chen, Shanjuan
    Yu, Yunlong
    Li, Yingming
    Lu, Ziqian
    Zhou, Yulin
    NEUROCOMPUTING, 2025, 611
  • [4] Weakly-Supervised RGBD Video Object Segmentation
    Yang, Jinyu
    Gao, Mingqi
    Zheng, Feng
    Zhen, Xiantong
    Ji, Rongrong
    Shao, Ling
    Leonardis, Ales
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 2158 - 2170
  • [5] Weakly supervised video object segmentation initialized with referring expression
    Bu, Xiaoqing
    Sun, Yukuan
    Wang, Jianming
    Liu, Kunliang
    Liang, Jiayu
    Jin, Guanghao
    Chung, Tae-Sun
    NEUROCOMPUTING, 2021, 453 : 754 - 765
  • [6] Mask generation dynamically regulates weakly supervised video instance segmentation
    He Z.
    Xu L.
    Zhang Y.
    Huang Y.
    Guangxue Jingmi Gongcheng/Optics and Precision Engineering, 2023, 31 (19): : 2884 - 2897
  • [7] Foundation Model Assisted Weakly Supervised Semantic Segmentation
    Yang, Xiaobo
    Gong, Xiaojin
    2024 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION, WACV 2024, 2024, : 512 - 521
  • [8] Vanishing mask refinement in semi-supervised video object segmentation
    Pita, Javier
    Llerena, Juan P.
    Patricio, Miguel A.
    Berlanga, Antonio
    Usero, Luis
    APPLIED SOFT COMPUTING, 2025, 172
  • [9] Sequential Clique Optimization for Unsupervised and Weakly Supervised Video Object Segmentation
    Koh, Yeong Jun
    Heo, Yuk
    Kim, Chang-Su
    ELECTRONICS, 2022, 11 (18)
  • [10] MEM: Mask Enhancement Model for Video Object Segmentation
    Abdelfattah, Islam
    Shehata, Mohamed S.
    ADVANCES IN VISUAL COMPUTING, ISVC 2024, PT I, 2025, 15046 : 262 - 274