Unified Mask Embedding and Correspondence Learning for Self-Supervised Video Segmentation

被引：11

作者：

Li, Liulei ^{[1
,4
]}

Wang, Wenguan ^{[1
]}

Zhou, Tianfei ^{[2
]}

Li, Jianwu ^{[3
]}

Yang, Yi ^{[1
]}

机构：

[1] Zhejiang Univ, CCAI, ReLER, Hangzhou, Peoples R China

[2] Swiss Fed Inst Technol, Zurich, Switzerland

[3] Beijing Inst Technol, Beijing, Peoples R China

[4] Baidu VIS, Sunnyvale, CA USA

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2023年

关键词：

D O I：

10.1109/CVPR52729.2023.01794

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The objective of this paper is self-supervised learning of video object segmentation. We develop a unified framework which simultaneously models cross-frame dense correspondence for locally discriminative feature learning and embeds object-level context for target-mask decoding. As a result, it is able to directly learn to perform mask-guided sequential segmentation from unlabeled videos, in contrast to previous efforts usually relying on an oblique solution - cheaply "copying" labels according to pixel-wise correlations. Concretely, our algorithm alternates between i) clustering video pixels for creating pseudo segmentation labels ex nihilo; and ii) utilizing the pseudo labels to learn mask encoding and decoding for VOS. Unsupervised correspondence learning is further incorporated into this self-taught, mask embedding scheme, so as to ensure the generic nature of the learnt representation and avoid cluster degeneracy. Our algorithm sets state-of-the-arts on two standard benchmarks (i.e., DAVIS(17) and YouTube-VOS), narrowing the gap between self- and fully-supervised VOS, in terms of both performance and network architecture design.

引用

页码：18706 / 18716

页数：11

共 50 条

[1] Spatial-then-Temporal Self-Supervised Learning for Video Correspondence
Li, Rui
Liu, Dong
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2279 - 2288
[2] Discriminative Spatiotemporal Alignment for Self-Supervised Video Correspondence Learning
Wei, Qiaoqiao
Zhang, Hui
Yong, Jun-Hai
2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1841 - 1846
[3] Self-Supervised AcousticWord Embedding Learning via Correspondence Transformer Encoder
Lin, Jingru
Yue, Xianghu
Ao, Junyi
Li, Haizhou
INTERSPEECH 2023, 2023, : 2988 - 2992
[4] Learning disentangled representation for self-supervised video object segmentation
Hou, Wenjie
Qin, Zheyun
Xi, Xiaoming
Lu, Xiankai
Yin, Yilong
NEUROCOMPUTING, 2022, 481 : 270 - 280
[5] Learning disentangled representation for self-supervised video object segmentation
Hou, Wenjie
Qin, Zheyun
Xi, Xiaoming
Lu, Xiankai
Yin, Yilong
Neurocomputing, 2022, 481 : 270 - 280
[6] Self-supervised Amodal Video Object Segmentation
Yao, Jian
Hong, Yuxin
Wang, Chiyu
Xiao, Tianjun
He, Tong
Locatello, Francesco
Wipf, David
Fu, Yanwei
Zhang, Zheng
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[7] Self-Supervised Correspondence in Visuomotor Policy Learning
Florence, Peter
Manuelli, Lucas
Tedrake, Russ
IEEE ROBOTICS AND AUTOMATION LETTERS, 2020, 5 (02) : 492 - 499
[8] Contrastive Transformation for Self-supervised Correspondence Learning
Wang, Ning
Zhou, Wengang
Li, Hougiang
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 10174 - 10182
[9] BaSSL: Boundary-aware Self-Supervised Learning for Video Scene Segmentation
Mun, Jonghwan
Shin, Minchul
Han, Gunsoo
Lee, Sangho
Ha, Seongsu
Lee, Joonseok
Kim, Eun-Sol
COMPUTER VISION - ACCV 2022, PT IV, 2023, 13844 : 485 - 501
[10] Self-Supervised Deep TripleNet for Video Object Segmentation
Xu, Kai
Wen, Longyin
Li, Guorong
Huang, Qingming
IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 3530 - 3539

← 1 2 3 4 5 →