Unified Mask Embedding and Correspondence Learning for Self-Supervised Video Segmentation

被引:11
|
作者
Li, Liulei [1 ,4 ]
Wang, Wenguan [1 ]
Zhou, Tianfei [2 ]
Li, Jianwu [3 ]
Yang, Yi [1 ]
机构
[1] Zhejiang Univ, CCAI, ReLER, Hangzhou, Peoples R China
[2] Swiss Fed Inst Technol, Zurich, Switzerland
[3] Beijing Inst Technol, Beijing, Peoples R China
[4] Baidu VIS, Sunnyvale, CA USA
关键词
D O I
10.1109/CVPR52729.2023.01794
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The objective of this paper is self-supervised learning of video object segmentation. We develop a unified framework which simultaneously models cross-frame dense correspondence for locally discriminative feature learning and embeds object-level context for target-mask decoding. As a result, it is able to directly learn to perform mask-guided sequential segmentation from unlabeled videos, in contrast to previous efforts usually relying on an oblique solution - cheaply "copying" labels according to pixel-wise correlations. Concretely, our algorithm alternates between i) clustering video pixels for creating pseudo segmentation labels ex nihilo; and ii) utilizing the pseudo labels to learn mask encoding and decoding for VOS. Unsupervised correspondence learning is further incorporated into this self-taught, mask embedding scheme, so as to ensure the generic nature of the learnt representation and avoid cluster degeneracy. Our algorithm sets state-of-the-arts on two standard benchmarks (i.e., DAVIS(17) and YouTube-VOS), narrowing the gap between self- and fully-supervised VOS, in terms of both performance and network architecture design.
引用
收藏
页码:18706 / 18716
页数:11
相关论文
共 50 条
  • [41] Self-Supervised Video Defocus Deblurring with Atlas Learning
    Ruan, Lingyan
    Balint, Martin
    Bemana, Mojtaba
    Wolski, Krzysztof
    Seidel, Hans-Peter
    Myszkowski, Karol
    Chen, Bin
    PROCEEDINGS OF SIGGRAPH 2024 CONFERENCE PAPERS, 2024,
  • [42] Contrast and Order Representations for Video Self-supervised Learning
    Hu, Kai
    Shao, Jie
    Liu, Yuan
    Raj, Bhiksha
    Savvides, Marios
    Shen, Zhiqiang
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 7919 - 7929
  • [43] Embedding Global Contrastive and Local Location in Self-Supervised Learning
    Zhao, Wenyi
    Li, Chongyi
    Zhang, Weidong
    Yang, Lu
    Zhuang, Peixian
    Li, Lingqiao
    Fan, Kefeng
    Yang, Huihua
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (05) : 2275 - 2289
  • [44] Self-Supervised Representation Learning for Video Quality Assessment
    Jiang, Shaojie
    Sang, Qingbing
    Hu, Zongyao
    Liu, Lixiong
    IEEE TRANSACTIONS ON BROADCASTING, 2023, 69 (01) : 118 - 129
  • [45] Broaden Your Views for Self-Supervised Video Learning
    Recasens, Adria
    Luc, Pauline
    Alayrac, Jean-Baptiste
    Wang, Luyu
    Strub, Florian
    Tallec, Corentin
    Malinowski, Mateusz
    Patraaucean, Viorica
    Altche, Florent
    Valko, Michal
    Grill, Jean-Bastien
    van den Oord, Aaron
    Zisserman, Andrew
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1235 - 1245
  • [46] Self-supervised learning of class embeddings from video
    Wiles, Olivia
    Koepke, A. Sophia
    Zisserman, Andrew
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 3019 - 3027
  • [47] Video Motion Perception for Self-supervised Representation Learning
    Li, Wei
    Luo, Dezhao
    Fang, Bo
    Li, Xiaoni
    Zhou, Yu
    Wang, Weiping
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT IV, 2022, 13532 : 508 - 520
  • [48] Joint-task Self-supervised Learning for Temporal Correspondence
    Li, Xueting
    Liu, Sifei
    De Mello, Shalini
    Wang, Xiaolong
    Kautz, Jan
    Yang, Ming-Hsuan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [49] CONTINUAL SELF-SUPERVISED LEARNING IN EARTH OBSERVATION WITH EMBEDDING REGULARIZATION
    Moieez, Hamna
    Marsocci, Valerio
    Scardapane, Simone
    IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 5029 - 5032
  • [50] Self-supervised Meta Auxiliary Learning for Actor and Action Video Segmentation from Natural Language
    Ye, Linwei
    Wang, Zhenhua
    ARTIFICIAL INTELLIGENCE, CICAI 2023, PT I, 2024, 14473 : 317 - 328