Decoupling Features in Hierarchical Propagation for Video Object Segmentation

被引:0
|
作者
Yang, Zongxin [1 ,2 ]
Yang, Yi [1 ]
机构
[1] Zhejiang Univ, Coll Comp Sci & Technol, CCAI, Hangzhou, Peoples R China
[2] Baidu Res, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper focuses on developing a more effective method of hierarchical propagation for semi-supervised Video Object Segmentation (VOS). Based on vision transformers, the recently-developed Associating Objects with Transformers (AOT) approach introduces hierarchical propagation into VOS and has shown promising results. The hierarchical propagation can gradually propagate information from past frames to the current frame and transfer the current frame feature from object-agnostic to object-specific. However, the increase of object-specific information will inevitably lead to the loss of object-agnostic visual information in deep propagation layers. To solve such a problem and further facilitate the learning of visual embeddings, this paper proposes a Decoupling Features in Hierarchical Propagation (DeAOT) approach. Firstly, DeAOT decouples the hierarchical propagation of object-agnostic and object-specific embeddings by handling them in two independent branches. Secondly, to compensate for the additional computation from dual-branch propagation, we propose an efficient module for constructing hierarchical propagation, i.e., Gated Propagation Module, which is carefully designed with single-head attention. Extensive experiments show that DeAOT significantly outperforms AOT in both accuracy and efficiency. On YouTube-VOS, DeAOT can achieve 86.0% at 22.4fps and 82.0% at 53.4fps. Without test-time augmentations, we achieve new state-of-the-art performance on four benchmarks, i.e., YouTube-VOS (86.2%), DAVIS 2017 (86.2%), DAVIS 2016 (92.9%), and VOT 2020 (0.622). Project page: https://github.com/z-x-yang/AOT.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Hierarchical Video Object Segmentation
    Xing, Junliang
    Ai, Haizhou
    Lao, Shihong
    2011 FIRST ASIAN CONFERENCE ON PATTERN RECOGNITION (ACPR), 2011, : 67 - 71
  • [2] Decoupling Multimodal Transformers for Referring Video Object Segmentation
    Gao, Mingqi
    Yang, Jinyu
    Han, Jungong
    Lu, Ke
    Zheng, Feng
    Montana, Giovanni
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (09) : 4518 - 4528
  • [3] Hierarchical Spatiotemporal Transformers for Video Object Segmentation
    Yoo, Jun-Sang
    Lee, Hongjae
    Jung, Seung-Won
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 795 - 805
  • [4] Asymmetric Label Propagation for Video Object Segmentation
    Chen, Zhen
    Yang, Ming
    Zhang, Shiliang
    PROCEEDINGS OF THE 4TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA IN ASIA, MMASIA 2022, 2022,
  • [5] Hierarchical Co-Attention Propagation Network for Zero-Shot Video Object Segmentation
    Pei, Gensheng
    Yao, Yazhou
    Shen, Fumin
    Huang, Dan
    Huang, Xingguo
    Shen, Heng-Tao
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 2348 - 2359
  • [6] Video object segmentation using multiple features
    Pardo, A
    PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS AND APPLICATIONS, 2004, 3287 : 597 - 604
  • [7] Hierarchical Memory Matching Network for Video Object Segmentation
    Seong, Hongje
    Oh, Seoung Wug
    Lee, Joon-Young
    Lee, Seongwon
    Lee, Suhyeon
    Kim, Euntai
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 12869 - 12878
  • [8] Video Object Segmentation by Hierarchical Localized Classification of Regions
    Zhang, Chenguang
    Ai, Haizhou
    2011 FIRST ASIAN CONFERENCE ON PATTERN RECOGNITION (ACPR), 2011, : 244 - 248
  • [9] Hierarchical threshold technique oriented to video object segmentation
    Zheng, JiaLi
    Zhang, YongDong
    Ni, GuangNan
    2007 INTERNATIONAL SYMPOSIUM ON INTELLIGENT SIGNAL PROCESSING AND COMMUNICATION SYSTEMS, VOLS 1 AND 2, 2007, : 862 - +
  • [10] Hierarchical probabilistic models for video object segmentation and tracking
    Thirde, D
    Jones, G
    PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 1, 2004, : 636 - 639