SpVOS: Efficient Video Object Segmentation With Triple Sparse Convolution

被引：1

作者：

Lin, Weihao ^{[1
]}

Chen, Tao ^{[1
]}

Yu, Chong ^{[2
]}

机构：

[1] Fudan Univ, Sch Informat Sci & Technol, Shanghai 200433, Peoples R China

[2] Fudan Univ, Acad Engn & Technol, Shanghai 200433, Peoples R China

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2023年 / 32卷

基金：

中国国家自然科学基金;

关键词：

Video object segmentation; convolutional neural networks; sparse convolution; PROPOSAL GENERATION;

D O I：

10.1109/TIP.2023.3327588

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Semi-supervised video object segmentation (Semi-VOS), which requires only annotating the first frame of a video to segment future frames, has received increased attention recently. Among existing Semi-VOS pipelines, the memory-matching-based one is becoming the main research stream, as it can fully utilize the temporal sequence information to obtain high-quality segmentation results. Even though this type of method has achieved promising performance, the overall framework still suffers from heavy computation overhead, mainly caused by the per-frame dense convolution operations between high-resolution feature maps and each kernel filter. Therefore, we propose a sparse baseline of VOS named SpVOS in this work, which develops a novel triple sparse convolution to reduce the computation costs of the overall VOS framework. The designed triple gate, taking full consideration of both spatial and temporal redundancy between adjacent video frames, adaptively makes a triple decision to decide how to apply the sparse convolution on each pixel to control the computation overhead of each layer, while maintaining sufficient discrimination capability to distinguish similar objects and avoid error accumulation. A mixed sparse training strategy, coupled with a designed objective considering the sparsity constraint, is also developed to balance the VOS segmentation performance and computation costs. Experiments are conducted on two mainstream VOS datasets, including DAVIS and Youtube-VOS. Results show that, the proposed SpVOS achieves superior performance over other state-of-the-art sparse methods, and even maintains comparable performance, e.g., an 83.04% (79.29%) overall score on the DAVIS-2017 (Youtube-VOS) validation set, with the typical non-sparse VOS baseline (82.88% for DAVIS-2017 and 80.36% for Youtube-VOS) while saving up to 42% FLOPs, showing its application potential for resource-constrained scenarios.

引用

页码：5977 / 5991

页数：15

共 50 条

[1] Adaptive Sparse Memory Networks for Efficient and Robust Video Object Segmentation
Dang, Jisheng
Zheng, Huicheng
Xu, Xiaohao
Wang, Longguang
Hu, Qingyong
Guo, Yulan
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (02) : 3820 - 3833
[2] Temporo-Spatial Parallel Sparse Memory Networks for Efficient Video Object Segmentation
Dang, Jisheng
Zheng, Huicheng
Wang, Bimei
Wang, Longguang
Guo, Yulan
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (11) : 17291 - 17304
[3] An efficient video object segmentation scheme
Ong, EP
Tye, BJ
Lin, WS
Etoh, M
2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 3361 - 3364
[4] SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation
Duke, Brendan
Ahmed, Abdalla
Wolf, Christian
Aarabi, Parham
Taylor, Graham W.
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 5908 - 5917
[5] Video Object Segmentation with 3D Convolution Network
Tang, Huiyun
Tao, Pin
Ma, Rui
Shi, Yuanchun
ICCCV 2019: PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON CONTROL AND COMPUTER VISION, 2019, : 28 - 32
[6] Efficient Video Object Segmentation via Network Modulation
Yang, Linjie
Wang, Yanran
Xiong, Xuehan
Yang, Jianchao
Katsaggelos, Aggelos K.
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6499 - 6507
[7] Efficient Regional Memory Network for Video Object Segmentation
Xie, Haozhe
Yao, Hongxun
Zhou, Shangchen
Zhang, Shengping
Sun, Wenxiu
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 1286 - 1295
[8] Efficient video object segmentation by Graph-Cut
Wang, Jinjun
Xu, Wei
Zhu, Shenghuo
Gong, Yihong
2007 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-5, 2007, : 496 - 499
[9] Robust and Efficient Memory Network for Video Object Segmentation
Chen, Yadang
Zhang, Dingwei
Yang, Zhi-Xin
Wu, Enhua
2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1769 - 1774
[10] Efficient spatiotemporal segmentation and video object generation for highway surveillance video
Shi, R
Li, XF
Li, ZM
2002 INTERNATIONAL CONFERENCE ON COMMUNICATIONS, CIRCUITS AND SYSTEMS AND WEST SINO EXPOSITION PROCEEDINGS, VOLS 1-4, 2002, : 580 - 584

← 1 2 3 4 5 →