Unified Mask Embedding and Correspondence Learning for Self-Supervised Video Segmentation

被引：11

作者：

Li, Liulei ^{[1
,4
]}

Wang, Wenguan ^{[1
]}

Zhou, Tianfei ^{[2
]}

Li, Jianwu ^{[3
]}

Yang, Yi ^{[1
]}

机构：

[1] Zhejiang Univ, CCAI, ReLER, Hangzhou, Peoples R China

[2] Swiss Fed Inst Technol, Zurich, Switzerland

[3] Beijing Inst Technol, Beijing, Peoples R China

[4] Baidu VIS, Sunnyvale, CA USA

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2023年

关键词：

D O I：

10.1109/CVPR52729.2023.01794

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The objective of this paper is self-supervised learning of video object segmentation. We develop a unified framework which simultaneously models cross-frame dense correspondence for locally discriminative feature learning and embeds object-level context for target-mask decoding. As a result, it is able to directly learn to perform mask-guided sequential segmentation from unlabeled videos, in contrast to previous efforts usually relying on an oblique solution - cheaply "copying" labels according to pixel-wise correlations. Concretely, our algorithm alternates between i) clustering video pixels for creating pseudo segmentation labels ex nihilo; and ii) utilizing the pseudo labels to learn mask encoding and decoding for VOS. Unsupervised correspondence learning is further incorporated into this self-taught, mask embedding scheme, so as to ensure the generic nature of the learnt representation and avoid cluster degeneracy. Our algorithm sets state-of-the-arts on two standard benchmarks (i.e., DAVIS(17) and YouTube-VOS), narrowing the gap between self- and fully-supervised VOS, in terms of both performance and network architecture design.

引用

页码：18706 / 18716

页数：11

共 50 条

[21] Self-supervised learning for robust video indexing
Ewerth, Ralph
Freisleben, Bernd
2006 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO - ICME 2006, VOLS 1-5, PROCEEDINGS, 2006, : 1749 - +
[22] An Embedding-Dynamic Approach to Self-Supervised Learning
Moon, Suhong
Buracas, Domas
Park, Seunghyun
Kim, Jinkyu
Canny, John
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 2749 - 2757
[23] Self-Supervised Visual Descriptor Learning for Dense Correspondence
Schmidt, Tanner
Newcombe, Richard
Fox, Dieter
IEEE ROBOTICS AND AUTOMATION LETTERS, 2017, 2 (02): : 420 - 427
[24] Self-supervised learning-leveraged boosting ultrasound image segmentation via mask reconstruction
Sang, Qingbing
Hou, Yajie
Qian, Pengjiang
Wu, Qin
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024, 15 (05) : 2039 - 2048
[25] Self-supervised learning-leveraged boosting ultrasound image segmentation via mask reconstruction
Qingbing Sang
Yajie Hou
Pengjiang Qian
Qin Wu
International Journal of Machine Learning and Cybernetics, 2024, 15 : 2039 - 2048
[26] Self-Supervised Video Representation Learning by Video Incoherence Detection
Cao, Haozhi
Xu, Yuecong
Mao, Kezhi
Xie, Lihua
Yin, Jianxiong
See, Simon
Xu, Qianwen
Yang, Jianfei
IEEE TRANSACTIONS ON CYBERNETICS, 2024, 54 (06) : 3810 - 3822
[27] Self-Supervised Interactive Embedding for One-Shot Organ Segmentation
Yang, Yang
Wang, Bo
Zhang, Dingwen
Yuan, Yixuan
Yan, Qingsen
Zhao, Shijie
You, Zheng
Han, Junwei
IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2023, 70 (10) : 2799 - 2808
[28] Online self-supervised learning for dynamic object segmentation
Guizilini, Vitor
Ramos, Fabio
INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2015, 34 (4-5): : 559 - 581
[29] Self-Supervised Vessel Segmentation via Adversarial Learning
Ma, Yuxin
Hua, Yang
Deng, Hanming
Song, Tao
Wang, Hao
Xue, Zhengui
Cao, Heng
Ma, Ruhui
Guan, Haibing
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 7516 - 7525
[30] View adaptive unified self-supervised technique for abdominal organ segmentation
Jain S.
Dhir R.
Sikka G.
Computers in Biology and Medicine, 2024, 177

← 1 2 3 4 5 →