VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation

被引:1
|
作者
Wang, Xudong [1 ]
Misra, Ishan
Zeng, Ziyun
Girdhar, Rohit
Darrell, Trevor
机构
[1] Univ Calif Berkeley, Berkeley, CA 94720 USA
关键词
D O I
10.1109/CVPR52733.2024.02147
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing approaches to unsupervised video instance segmentation typically rely on motion estimates and experience difficulties tracking small or divergent motions. We present VideoCutLER, a simple method for unsupervised multi-instance video segmentation without using motion-based learning signals like optical flow or training on natural videos. Our key insight is that using high-quality pseudo masks and a simple video synthesis method for model training is surprisingly sufficient to enable the resulting video model to effectively segment and track multiple instances across video frames. We show the first competitive unsupervised learning results on the challenging YouTubeVIS-2019 benchmark, achieving 50.7% AP(50)(video), surpassing the previous state-of-the-art by a large margin. VideoCutLER can also serve as a strong pretrained model for supervised video instance segmentation tasks, exceeding DINO by 15.9% on YouTubeVIS-2019 in terms of AP(video).
引用
收藏
页码:22755 / 22764
页数:10
相关论文
共 50 条
  • [41] Recurrent Graph Neural Networks for Video Instance Segmentation
    Brissman, Emil
    Johnander, Joakim
    Danelljan, Martin
    Felsberg, Michael
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2023, 131 (02) : 471 - 495
  • [42] Video Instance Segmentation Tracking with a Modified VAE Architecture
    Lin, Chung-Ching
    Hung, Ying
    Feris, Rogerio
    He, Linglin
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, : 13144 - 13154
  • [43] Temporal Feature Augmented Network for Video Instance Segmentation
    Dong, Minghui
    Wang, Jian
    Huang, Yuanyuan
    Yu, Dongdong
    Su, Kai
    Zhou, Kaihui
    Shao, Jie
    Wen, Shiping
    Wang, Changhu
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 721 - 724
  • [44] Temporally Efficient Vision Transformer for Video Instance Segmentation
    Yang, Shusheng
    Wang, Xinggang
    Li, Yu
    Fang, Yuxin
    Fang, Jiemin
    Liu, Wenyu
    Zhao, Xun
    Shan, Ying
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 2875 - 2885
  • [45] In-Depth Collaboratively Supervised Video Instance Segmentation
    Deng, Yunnan
    Zhang, Yinhui
    He, Zifen
    ELECTRONICS, 2025, 14 (02):
  • [46] Recurrent Graph Neural Networks for Video Instance Segmentation
    Emil Brissman
    Joakim Johnander
    Martin Danelljan
    Michael Felsberg
    International Journal of Computer Vision, 2023, 131 : 471 - 495
  • [47] CTVIS: Consistent Training for Online Video Instance Segmentation
    Ying, Kaining
    Zhong, Qing
    Mao, Weian
    Wang, Zhenhua
    Chen, Hao
    Wu, Lin Yuanbo
    Liu, Yifan
    Fan, Chengxiang
    Zhuge, Yunzhi
    Shen, Chunhua
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 899 - 908
  • [48] Occluded Video Instance Segmentation with Set Prediction Approach
    Bae, Heechul
    Song, Soonyong
    Park, Junhee
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 3843 - 3846
  • [49] Video Instance Segmentation with a Propose-Reduce Paradigm
    Lin, Huaijia
    Wu, Ruizheng
    Liu, Shu
    Lu, Jiangbo
    Jia, Jiaya
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1719 - 1728
  • [50] LIP: Learning Instance Propagation for Video Object Segmentation
    Lyu, Ye
    Vosselman, George
    Xia, Gui-Song
    Yang, Michael Ying
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 2739 - 2748