Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation

被引：0

作者：

Cheng, Ho Kei ^{[1
]}

Tai, Yu-Wing ^{[2
]}

Tang, Chi-Keung ^{[3
]}

机构：

[1] Univ Illinois, Urbana, IL 61801 USA

[2] Kuaishou Technol, Beijing, Peoples R China

[3] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021) | 2021年 / 34卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper presents a simple yet effective approach to modeling space-time correspondences in the context of video object segmentation. Unlike most existing approaches, we establish correspondences directly between frames without re-encoding the mask features for every object, leading to a highly efficient and robust framework. With the correspondences, every node in the current query frame is inferred by aggregating features from the past in an associative fashion. We cast the aggregation process as a voting problem and find that the existing inner-product affinity leads to poor use of memory with a small (fixed) subset of memory nodes dominating the votes, regardless of the query. In light of this phenomenon, we propose using the negative squared Euclidean distance instead to compute the affinities. We validate that every memory node now has a chance to contribute, and experimentally show that such diversified voting is beneficial to both memory efficiency and inference accuracy. The synergy of correspondence networks and diversified voting works exceedingly well, achieves new state-of-the-art results on both DAVIS and YouTubeVOS datasets while running significantly faster at 20+ FPS for multiple objects without bells and whistles.

引用

页数：14

共 50 条

[21] An efficient video object segmentation scheme
Ong, EP
Tye, BJ
Lin, WS
Etoh, M
2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 3361 - 3364
[22] Real-time video object segmentation using HSV space
Li, N
Bu, JJ
Chen, C
2002 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL II, PROCEEDINGS, 2002, : 85 - 88
[23] Optimal space-time coverage and exploration costs in groundwater monitoring networks
Nunes, LM
Cunha, MC
Ribeiro, L
ENVIRONMENTAL MONITORING AND ASSESSMENT, 2004, 93 (1-3) : 103 - 124
[24] Optimal Space-time Coverage and Exploration Costs in Groundwater Monitoring Networks
L. M. Nunes
M. C. Cunha
L. Ribeiro
Environmental Monitoring and Assessment, 2004, 93 : 103 - 124
[25] Space-time completion of video
Wexler, Yonatan
Shechtman, Eli
Irani, Michal
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2007, 29 (03) : 463 - 476
[26] Space-time video completion
Wexler, Y
Shechtman, E
Irani, M
PROCEEDINGS OF THE 2004 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, 2004, : 120 - 127
[27] Joint space-time image sequence segmentation: Object tunnels and occlusion volumes
Ristivojevic, M
Konrad, J
2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL III, PROCEEDINGS: IMAGE AND MULTIDIMENSIONAL SIGNAL PROCESSING SPECIAL SESSIONS, 2004, : 9 - 12
[28] Learning Video Object Segmentation with Visual Memory
Tokmakov, Pavel
Inria, Karteek Alahari
Schmid, Cordelia
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 4491 - 4500
[29] Adaptive Memory Management for Video Object Segmentation
Pourganjalikhan, Ali
Poullis, Charalambos
2022 19TH CONFERENCE ON ROBOTS AND VISION (CRV 2022), 2022, : 75 - 82
[30] Modulated Memory Network for Video Object Segmentation
Lu, Hannan
Guo, Zixian
Zuo, Wangmeng
MATHEMATICS, 2024, 12 (06)

← 1 2 3 4 5 →