MWVOS: Mask-Free Weakly Supervised Video Object Segmentation via promptable foundation model

被引:0
|
作者
Zhang, Zhenghao [1 ]
Zhang, Shengfan [1 ]
Dai, Zuozhuo [1 ]
Dong, Zilong [1 ]
Zhu, Siyu [2 ]
机构
[1] Alibaba Grp, Hangzhou 310030, Peoples R China
[2] Fudan Univ, Shanghai 200433, Peoples R China
关键词
Vision foundation model; Video instance segmentation; Deep learning;
D O I
10.1016/j.patcog.2024.111100
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The current state-of-the-art techniques for video object segmentation necessitate extensive training on video datasets with mask annotations, thereby constraining their ability to transfer zero-shot learning to new image distributions and tasks. However, recent advancements in foundation models, particularly in the domain of image segmentation, have showcased robust generalization capabilities, introducing a novel prompt-driven paradigm fora variety of downstream segmentation challenges on new data distributions. This study delves into the potential of vision foundation models using diverse prompt strategies and proposes a mask-free approach for unsupervised video object segmentation. To further improve the efficacy of prompt learning in diverse and complex video scenes, we introduce a spatial-temporal decoupled deformable attention mechanism to establish an effective correlation between intra- and inter-frame features. Extensive experiments conducted on the DAVIS2017-unsupervised and YoutubeVIS19&21 and OIVS datasets demonstrate the superior performance of the proposed approach without mask supervision when compared to existing mask-supervised methods, as well as its capacity to generalize to weakly-annotated video datasets.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] Scribble-Supervised Video Object Segmentation via Scribble Enhancement
    Gao, Xingyu
    Li, Zuolei
    Shi, Hailong
    Chen, Zhenyu
    Zhao, Peilin
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (04) : 2999 - 3012
  • [22] Weakly-Supervised Video Object Grounding via Stable Context Learning
    Wang, Wei
    Gao, Junyu
    Xu, Changsheng
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 760 - 768
  • [23] Learning with Noise: Mask-Guided Attention Model for Weakly Supervised Nuclei Segmentation
    Guo, Ruoyu
    Pagnucco, Maurice
    Song, Yang
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT II, 2021, 12902 : 461 - 470
  • [24] A progressive segmentation with weight contrast label enhancement for weakly supervised video salient object detection
    Lu, Zelin
    Liang, Haoran
    Xu, Binwei
    Liang, Ronghua
    IET IMAGE PROCESSING, 2023, 17 (10) : 2925 - 2936
  • [25] Query-Memory Re-Aggregation for Weakly-supervised Video Object Segmentation
    Lin, Fanchao
    Xie, Hongtao
    Li, Yan
    Zhang, Yongdong
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 2038 - 2046
  • [26] Self-supervised video object segmentation via pseudo label rectification
    Guo, Pinxue
    Zhang, Wei
    Li, Xiaoqiang
    Fan, Jianping
    Zhang, Wenqiang
    PATTERN RECOGNITION, 2025, 163
  • [27] Distance-Guided Mask Propagation Model for Efficient Video Object Segmentation
    Liu, Jiajia
    Dai, Hongning
    Li, Bo
    Tang, Gaozhong
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [28] Weakly Supervised Few-Shot Semantic Segmentation via Pseudo Mask Enhancement and Meta Learning
    Zhang, Man
    Zhou, Yong
    Liu, Bing
    Zhao, Jiaqi
    Yao, Rui
    Shao, Zhiwen
    Zhu, Hancheng
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 7980 - 7991
  • [29] Learning robust correlation with foundation model for weakly-supervised few-shot segmentation
    Huang, Xinyang
    Zhu, Chuang
    Liu, Kebin
    Ren, Ruiying
    Liu, Shengjie
    KNOWLEDGE-BASED SYSTEMS, 2024, 299
  • [30] Weakly-Supervised Video Object Grounding via Learning Uni-Modal Associations
    Wang, Wei
    Gao, Junyu
    Xu, Changsheng
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 6329 - 6340