Decoupling Features in Hierarchical Propagation for Video Object Segmentation

被引:0
|
作者
Yang, Zongxin [1 ,2 ]
Yang, Yi [1 ]
机构
[1] Zhejiang Univ, Coll Comp Sci & Technol, CCAI, Hangzhou, Peoples R China
[2] Baidu Res, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper focuses on developing a more effective method of hierarchical propagation for semi-supervised Video Object Segmentation (VOS). Based on vision transformers, the recently-developed Associating Objects with Transformers (AOT) approach introduces hierarchical propagation into VOS and has shown promising results. The hierarchical propagation can gradually propagate information from past frames to the current frame and transfer the current frame feature from object-agnostic to object-specific. However, the increase of object-specific information will inevitably lead to the loss of object-agnostic visual information in deep propagation layers. To solve such a problem and further facilitate the learning of visual embeddings, this paper proposes a Decoupling Features in Hierarchical Propagation (DeAOT) approach. Firstly, DeAOT decouples the hierarchical propagation of object-agnostic and object-specific embeddings by handling them in two independent branches. Secondly, to compensate for the additional computation from dual-branch propagation, we propose an efficient module for constructing hierarchical propagation, i.e., Gated Propagation Module, which is carefully designed with single-head attention. Extensive experiments show that DeAOT significantly outperforms AOT in both accuracy and efficiency. On YouTube-VOS, DeAOT can achieve 86.0% at 22.4fps and 82.0% at 53.4fps. Without test-time augmentations, we achieve new state-of-the-art performance on four benchmarks, i.e., YouTube-VOS (86.2%), DAVIS 2017 (86.2%), DAVIS 2016 (92.9%), and VOT 2020 (0.622). Project page: https://github.com/z-x-yang/AOT.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] CONTEXT PROPAGATION FROM PROPOSALS FOR SEMANTIC VIDEO OBJECT SEGMENTATION
    Wang, Tinghuai
    2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 256 - 260
  • [22] Integration of motion and image features for automatic video object segmentation
    Wei, W
    Ngan, KN
    ICIP: 2004 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1- 5, 2004, : 361 - 364
  • [23] Video object segmentation research based on features joint modeling
    Li, Zong-Min
    Gong, Xu-Chao
    Liu, Yu-Jie
    Jisuanji Xuebao/Chinese Journal of Computers, 2013, 36 (11): : 2356 - 2363
  • [24] Hierarchical semi-automatic video object segmentation for multimedia applications
    Cooray, S
    O'Connor, N
    Marlow, S
    Murphy, N
    Curran, T
    INTERNET MULTIMEDIA MANAGEMENT SYSTEMS II, 2001, 4519 : 10 - 19
  • [25] Automatic Video Object Segmentation Using Volume Growing and Hierarchical Clustering
    Fatih Porikli
    Yao Wang
    EURASIP Journal on Advances in Signal Processing, 2004
  • [26] Automatic video object segmentation using volume growing and hierarchical clustering
    Porikli, F
    Wang, Y
    EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2004, 2004 (06) : 814 - 832
  • [27] Dual Attention Based Network with Hierarchical ConvLSTM for Video Object Segmentation
    Zhao, Zongji
    Zhao, Sanyuan
    PATTERN RECOGNITION AND COMPUTER VISION, PT IV, 2021, 13022 : 323 - 335
  • [28] Automatic video object segmentation using volume growing and hierarchical clustering
    Porikli, F. (fatih@merl.com), 1600, Hindawi Publishing Corporation (2004):
  • [29] Efficient frame-sequential label propagation for video object segmentation
    Yadang Chen
    Chuanyan Hao
    Wen Wu
    Enhua Wu
    Multimedia Tools and Applications, 2018, 77 : 6117 - 6133
  • [30] Fast Video Object Segmentation by Reference-Guided Mask Propagation
    Oh, Seoung Wug
    Lee, Joon-Young
    Sunkavalli, Kalyan
    Kim, Seon Joo
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7376 - 7385