SpVOS: Efficient Video Object Segmentation With Triple Sparse Convolution

被引:1
|
作者
Lin, Weihao [1 ]
Chen, Tao [1 ]
Yu, Chong [2 ]
机构
[1] Fudan Univ, Sch Informat Sci & Technol, Shanghai 200433, Peoples R China
[2] Fudan Univ, Acad Engn & Technol, Shanghai 200433, Peoples R China
基金
中国国家自然科学基金;
关键词
Video object segmentation; convolutional neural networks; sparse convolution; PROPOSAL GENERATION;
D O I
10.1109/TIP.2023.3327588
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Semi-supervised video object segmentation (Semi-VOS), which requires only annotating the first frame of a video to segment future frames, has received increased attention recently. Among existing Semi-VOS pipelines, the memory-matching-based one is becoming the main research stream, as it can fully utilize the temporal sequence information to obtain high-quality segmentation results. Even though this type of method has achieved promising performance, the overall framework still suffers from heavy computation overhead, mainly caused by the per-frame dense convolution operations between high-resolution feature maps and each kernel filter. Therefore, we propose a sparse baseline of VOS named SpVOS in this work, which develops a novel triple sparse convolution to reduce the computation costs of the overall VOS framework. The designed triple gate, taking full consideration of both spatial and temporal redundancy between adjacent video frames, adaptively makes a triple decision to decide how to apply the sparse convolution on each pixel to control the computation overhead of each layer, while maintaining sufficient discrimination capability to distinguish similar objects and avoid error accumulation. A mixed sparse training strategy, coupled with a designed objective considering the sparsity constraint, is also developed to balance the VOS segmentation performance and computation costs. Experiments are conducted on two mainstream VOS datasets, including DAVIS and Youtube-VOS. Results show that, the proposed SpVOS achieves superior performance over other state-of-the-art sparse methods, and even maintains comparable performance, e.g., an 83.04% (79.29%) overall score on the DAVIS-2017 (Youtube-VOS) validation set, with the typical non-sparse VOS baseline (82.88% for DAVIS-2017 and 80.36% for Youtube-VOS) while saving up to 42% FLOPs, showing its application potential for resource-constrained scenarios.
引用
收藏
页码:5977 / 5991
页数:15
相关论文
共 50 条
  • [31] Video Object of Interest Segmentation
    Zhou, Siyuan
    Zhan, Chunru
    Wang, Biao
    Ge, Tiezheng
    Jiang, Yuning
    Niu, Li
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 3, 2023, : 3805 - 3813
  • [32] An Overview of Video Object Segmentation
    Zhu, Shiping
    Guo, Zhichao
    2012 INTERNATIONAL CONFERENCE ON INDUSTRIAL CONTROL AND ELECTRONICS ENGINEERING (ICICEE), 2012, : 1019 - 1021
  • [33] Gamifying Video Object Segmentation
    Spampinato, Concetto
    Palazzo, Simone
    Giordano, Daniela
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (10) : 1942 - 1958
  • [34] On guiding video object segmentation
    Ortego, Diego
    McGuinness, Kevin
    SanMiguel, Juan C.
    Arazo, Eric
    Martinez, Jose M.
    O'Connor, Noel E.
    2019 INTERNATIONAL CONFERENCE ON CONTENT-BASED MULTIMEDIA INDEXING (CBMI), 2019,
  • [35] Video object clustering segmentation
    Lin, Q
    Zhang, X
    2003 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-5, PROCEEDINGS, 2003, : 2840 - 2843
  • [36] Object segmentation for video coding
    Chen, LH
    Chen, JR
    Liao, HY
    15TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 3, PROCEEDINGS: IMAGE, SPEECH AND SIGNAL PROCESSING, 2000, : 383 - 386
  • [37] VIDEO OBJECT SEGMENTATION AGGREGATION
    Zhou, Tianfei
    Lu, Yao
    Di, Huijun
    Zhang, Jian
    2016 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO (ICME), 2016,
  • [38] Hierarchical Video Object Segmentation
    Xing, Junliang
    Ai, Haizhou
    Lao, Shihong
    2011 FIRST ASIAN CONFERENCE ON PATTERN RECOGNITION (ACPR), 2011, : 67 - 71
  • [39] Novel Dilated Separable Convolution Networks for Efficient Video Salient Object Detection in the Wild
    Singh, Hemraj
    Verma, Mridula
    Cheruku, Ramalingaswamy
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2023, 72
  • [40] Video Object Segmentation: A Survey
    Sasithradevi, A.
    Roomi, S. Mohamed Mansoor
    Mareeswari, M.
    PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON COMMUNICATION AND ELECTRONICS SYSTEMS (ICCES), 2016, : 656 - 660