SpVOS: Efficient Video Object Segmentation With Triple Sparse Convolution

被引:1
|
作者
Lin, Weihao [1 ]
Chen, Tao [1 ]
Yu, Chong [2 ]
机构
[1] Fudan Univ, Sch Informat Sci & Technol, Shanghai 200433, Peoples R China
[2] Fudan Univ, Acad Engn & Technol, Shanghai 200433, Peoples R China
基金
中国国家自然科学基金;
关键词
Video object segmentation; convolutional neural networks; sparse convolution; PROPOSAL GENERATION;
D O I
10.1109/TIP.2023.3327588
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Semi-supervised video object segmentation (Semi-VOS), which requires only annotating the first frame of a video to segment future frames, has received increased attention recently. Among existing Semi-VOS pipelines, the memory-matching-based one is becoming the main research stream, as it can fully utilize the temporal sequence information to obtain high-quality segmentation results. Even though this type of method has achieved promising performance, the overall framework still suffers from heavy computation overhead, mainly caused by the per-frame dense convolution operations between high-resolution feature maps and each kernel filter. Therefore, we propose a sparse baseline of VOS named SpVOS in this work, which develops a novel triple sparse convolution to reduce the computation costs of the overall VOS framework. The designed triple gate, taking full consideration of both spatial and temporal redundancy between adjacent video frames, adaptively makes a triple decision to decide how to apply the sparse convolution on each pixel to control the computation overhead of each layer, while maintaining sufficient discrimination capability to distinguish similar objects and avoid error accumulation. A mixed sparse training strategy, coupled with a designed objective considering the sparsity constraint, is also developed to balance the VOS segmentation performance and computation costs. Experiments are conducted on two mainstream VOS datasets, including DAVIS and Youtube-VOS. Results show that, the proposed SpVOS achieves superior performance over other state-of-the-art sparse methods, and even maintains comparable performance, e.g., an 83.04% (79.29%) overall score on the DAVIS-2017 (Youtube-VOS) validation set, with the typical non-sparse VOS baseline (82.88% for DAVIS-2017 and 80.36% for Youtube-VOS) while saving up to 42% FLOPs, showing its application potential for resource-constrained scenarios.
引用
收藏
页码:5977 / 5991
页数:15
相关论文
共 50 条
  • [41] MULTI-DIMENSIONAL PRUNED SPARSE CONVOLUTION FOR EFFICIENT 3D OBJECT DETECTION
    Li, Linye
    Yue, Xiaodong
    Xu, Zhikang
    Xie, Shaorong
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 3190 - 3194
  • [42] Semantic Segmentation With Oblique Convolution for Object Detection
    Lin, Yun
    Sun, Xiaogang
    Xie, Zhixuan
    Yi, Jiaqi
    Zhong, Yong
    IEEE ACCESS, 2020, 8 (08): : 25326 - 25334
  • [43] Accelerating Video Object Segmentation with Compressed Video
    Xu, Kai
    Yao, Angela
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 1332 - 1341
  • [44] Unified Spatio-Temporal Dynamic Routing for Efficient Video Object Segmentation
    Dang, Jisheng
    Zheng, Huicheng
    Xu, Xiaohao
    Guo, Yulan
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (05) : 4512 - 4526
  • [45] Distance-Guided Mask Propagation Model for Efficient Video Object Segmentation
    Liu, Jiajia
    Dai, Hongning
    Li, Bo
    Tang, Gaozhong
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [46] Efficient Object Segmentation Using Background Estimation for H.264 Video
    Lu, Yu
    Xu, Xiaorong
    2012 INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND MOBILE COMPUTING (WICOM), 2012,
  • [47] Unsupervised object segmentation in video by efficient selection of highly probable positive features
    Haller, Emanuela
    Leordeanu, Marius
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5095 - 5103
  • [48] An efficient scalable object contour tracking scheme and its application for video segmentation
    Hu, MY
    Worrall, S
    Sadka, AH
    Kondoz, AM
    2004 IEEE 6TH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 2004, : 155 - 158
  • [49] Triple attention network for video segmentation
    Tian, Yan
    Zhang, Yujie
    Zhou, Di
    Cheng, Guohua
    Chen, Wei-Gang
    Wang, Ruili
    NEUROCOMPUTING, 2020, 417 (417) : 202 - 211
  • [50] Compressed Domain Video Object Segmentation
    Porikli, Fatih
    Bashir, Faisal
    Sun, Huifang
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2010, 20 (01) : 2 - 14