Unified Mask Embedding and Correspondence Learning for Self-Supervised Video Segmentation

被引：11

作者：

Li, Liulei ^{[1
,4
]}

Wang, Wenguan ^{[1
]}

Zhou, Tianfei ^{[2
]}

Li, Jianwu ^{[3
]}

Yang, Yi ^{[1
]}

机构：

[1] Zhejiang Univ, CCAI, ReLER, Hangzhou, Peoples R China

[2] Swiss Fed Inst Technol, Zurich, Switzerland

[3] Beijing Inst Technol, Beijing, Peoples R China

[4] Baidu VIS, Sunnyvale, CA USA

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2023年

关键词：

D O I：

10.1109/CVPR52729.2023.01794

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The objective of this paper is self-supervised learning of video object segmentation. We develop a unified framework which simultaneously models cross-frame dense correspondence for locally discriminative feature learning and embeds object-level context for target-mask decoding. As a result, it is able to directly learn to perform mask-guided sequential segmentation from unlabeled videos, in contrast to previous efforts usually relying on an oblique solution - cheaply "copying" labels according to pixel-wise correlations. Concretely, our algorithm alternates between i) clustering video pixels for creating pseudo segmentation labels ex nihilo; and ii) utilizing the pseudo labels to learn mask encoding and decoding for VOS. Unsupervised correspondence learning is further incorporated into this self-taught, mask embedding scheme, so as to ensure the generic nature of the learnt representation and avoid cluster degeneracy. Our algorithm sets state-of-the-arts on two standard benchmarks (i.e., DAVIS(17) and YouTube-VOS), narrowing the gap between self- and fully-supervised VOS, in terms of both performance and network architecture design.

引用

页码：18706 / 18716

页数：11

共 50 条

[41] Self-Supervised Video Defocus Deblurring with Atlas Learning
Ruan, Lingyan
Balint, Martin
Bemana, Mojtaba
Wolski, Krzysztof
Seidel, Hans-Peter
Myszkowski, Karol
Chen, Bin
PROCEEDINGS OF SIGGRAPH 2024 CONFERENCE PAPERS, 2024,
[42] Contrast and Order Representations for Video Self-supervised Learning
Hu, Kai
Shao, Jie
Liu, Yuan
Raj, Bhiksha
Savvides, Marios
Shen, Zhiqiang
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 7919 - 7929
[43] Embedding Global Contrastive and Local Location in Self-Supervised Learning
Zhao, Wenyi
Li, Chongyi
Zhang, Weidong
Yang, Lu
Zhuang, Peixian
Li, Lingqiao
Fan, Kefeng
Yang, Huihua
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (05) : 2275 - 2289
[44] Self-Supervised Representation Learning for Video Quality Assessment
Jiang, Shaojie
Sang, Qingbing
Hu, Zongyao
Liu, Lixiong
IEEE TRANSACTIONS ON BROADCASTING, 2023, 69 (01) : 118 - 129
[45] Broaden Your Views for Self-Supervised Video Learning
Recasens, Adria
Luc, Pauline
Alayrac, Jean-Baptiste
Wang, Luyu
Strub, Florian
Tallec, Corentin
Malinowski, Mateusz
Patraaucean, Viorica
Altche, Florent
Valko, Michal
Grill, Jean-Bastien
van den Oord, Aaron
Zisserman, Andrew
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1235 - 1245
[46] Self-supervised learning of class embeddings from video
Wiles, Olivia
Koepke, A. Sophia
Zisserman, Andrew
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 3019 - 3027
[47] Video Motion Perception for Self-supervised Representation Learning
Li, Wei
Luo, Dezhao
Fang, Bo
Li, Xiaoni
Zhou, Yu
Wang, Weiping
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT IV, 2022, 13532 : 508 - 520
[48] Joint-task Self-supervised Learning for Temporal Correspondence
Li, Xueting
Liu, Sifei
De Mello, Shalini
Wang, Xiaolong
Kautz, Jan
Yang, Ming-Hsuan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[49] CONTINUAL SELF-SUPERVISED LEARNING IN EARTH OBSERVATION WITH EMBEDDING REGULARIZATION
Moieez, Hamna
Marsocci, Valerio
Scardapane, Simone
IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 5029 - 5032
[50] Self-supervised Meta Auxiliary Learning for Actor and Action Video Segmentation from Natural Language
Ye, Linwei
Wang, Zhenhua
ARTIFICIAL INTELLIGENCE, CICAI 2023, PT I, 2024, 14473 : 317 - 328

← 1 2 3 4 5 →