Decoupling Features in Hierarchical Propagation for Video Object Segmentation

被引：0

作者：

Yang, Zongxin ^{[1
,2
]}

Yang, Yi ^{[1
]}

机构：

[1] Zhejiang Univ, Coll Comp Sci & Technol, CCAI, Hangzhou, Peoples R China

[2] Baidu Res, Beijing, Peoples R China

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022) | 2022年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper focuses on developing a more effective method of hierarchical propagation for semi-supervised Video Object Segmentation (VOS). Based on vision transformers, the recently-developed Associating Objects with Transformers (AOT) approach introduces hierarchical propagation into VOS and has shown promising results. The hierarchical propagation can gradually propagate information from past frames to the current frame and transfer the current frame feature from object-agnostic to object-specific. However, the increase of object-specific information will inevitably lead to the loss of object-agnostic visual information in deep propagation layers. To solve such a problem and further facilitate the learning of visual embeddings, this paper proposes a Decoupling Features in Hierarchical Propagation (DeAOT) approach. Firstly, DeAOT decouples the hierarchical propagation of object-agnostic and object-specific embeddings by handling them in two independent branches. Secondly, to compensate for the additional computation from dual-branch propagation, we propose an efficient module for constructing hierarchical propagation, i.e., Gated Propagation Module, which is carefully designed with single-head attention. Extensive experiments show that DeAOT significantly outperforms AOT in both accuracy and efficiency. On YouTube-VOS, DeAOT can achieve 86.0% at 22.4fps and 82.0% at 53.4fps. Without test-time augmentations, we achieve new state-of-the-art performance on four benchmarks, i.e., YouTube-VOS (86.2%), DAVIS 2017 (86.2%), DAVIS 2016 (92.9%), and VOT 2020 (0.622). Project page: https://github.com/z-x-yang/AOT.

引用

页数：13

共 50 条

[21] CONTEXT PROPAGATION FROM PROPOSALS FOR SEMANTIC VIDEO OBJECT SEGMENTATION
Wang, Tinghuai
2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 256 - 260
[22] Integration of motion and image features for automatic video object segmentation
Wei, W
Ngan, KN
ICIP: 2004 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1- 5, 2004, : 361 - 364
[23] Video object segmentation research based on features joint modeling
Li, Zong-Min
Gong, Xu-Chao
Liu, Yu-Jie
Jisuanji Xuebao/Chinese Journal of Computers, 2013, 36 (11): : 2356 - 2363
[24] Hierarchical semi-automatic video object segmentation for multimedia applications
Cooray, S
O'Connor, N
Marlow, S
Murphy, N
Curran, T
INTERNET MULTIMEDIA MANAGEMENT SYSTEMS II, 2001, 4519 : 10 - 19
[25] Automatic Video Object Segmentation Using Volume Growing and Hierarchical Clustering
Fatih Porikli
Yao Wang
EURASIP Journal on Advances in Signal Processing, 2004
[26] Automatic video object segmentation using volume growing and hierarchical clustering
Porikli, F
Wang, Y
EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2004, 2004 (06) : 814 - 832
[27] Dual Attention Based Network with Hierarchical ConvLSTM for Video Object Segmentation
Zhao, Zongji
Zhao, Sanyuan
PATTERN RECOGNITION AND COMPUTER VISION, PT IV, 2021, 13022 : 323 - 335
[28] Automatic video object segmentation using volume growing and hierarchical clustering
Porikli, F. (fatih@merl.com), 1600, Hindawi Publishing Corporation (2004):
[29] Efficient frame-sequential label propagation for video object segmentation
Yadang Chen
Chuanyan Hao
Wen Wu
Enhua Wu
Multimedia Tools and Applications, 2018, 77 : 6117 - 6133
[30] Fast Video Object Segmentation by Reference-Guided Mask Propagation
Oh, Seoung Wug
Lee, Joon-Young
Sunkavalli, Kalyan
Kim, Seon Joo
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7376 - 7385

← 1 2 3 4 5 →