Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation

被引:0
|
作者
Cheng, Ho Kei [1 ]
Tai, Yu-Wing [2 ]
Tang, Chi-Keung [3 ]
机构
[1] Univ Illinois, Urbana, IL 61801 USA
[2] Kuaishou Technol, Beijing, Peoples R China
[3] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a simple yet effective approach to modeling space-time correspondences in the context of video object segmentation. Unlike most existing approaches, we establish correspondences directly between frames without re-encoding the mask features for every object, leading to a highly efficient and robust framework. With the correspondences, every node in the current query frame is inferred by aggregating features from the past in an associative fashion. We cast the aggregation process as a voting problem and find that the existing inner-product affinity leads to poor use of memory with a small (fixed) subset of memory nodes dominating the votes, regardless of the query. In light of this phenomenon, we propose using the negative squared Euclidean distance instead to compute the affinities. We validate that every memory node now has a chance to contribute, and experimentally show that such diversified voting is beneficial to both memory efficiency and inference accuracy. The synergy of correspondence networks and diversified voting works exceedingly well, achieves new state-of-the-art results on both DAVIS and YouTubeVOS datasets while running significantly faster at 20+ FPS for multiple objects without bells and whistles.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] An efficient video object segmentation scheme
    Ong, EP
    Tye, BJ
    Lin, WS
    Etoh, M
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 3361 - 3364
  • [22] Real-time video object segmentation using HSV space
    Li, N
    Bu, JJ
    Chen, C
    2002 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL II, PROCEEDINGS, 2002, : 85 - 88
  • [23] Optimal space-time coverage and exploration costs in groundwater monitoring networks
    Nunes, LM
    Cunha, MC
    Ribeiro, L
    ENVIRONMENTAL MONITORING AND ASSESSMENT, 2004, 93 (1-3) : 103 - 124
  • [24] Optimal Space-time Coverage and Exploration Costs in Groundwater Monitoring Networks
    L. M. Nunes
    M. C. Cunha
    L. Ribeiro
    Environmental Monitoring and Assessment, 2004, 93 : 103 - 124
  • [25] Space-time completion of video
    Wexler, Yonatan
    Shechtman, Eli
    Irani, Michal
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2007, 29 (03) : 463 - 476
  • [26] Space-time video completion
    Wexler, Y
    Shechtman, E
    Irani, M
    PROCEEDINGS OF THE 2004 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, 2004, : 120 - 127
  • [27] Joint space-time image sequence segmentation: Object tunnels and occlusion volumes
    Ristivojevic, M
    Konrad, J
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL III, PROCEEDINGS: IMAGE AND MULTIDIMENSIONAL SIGNAL PROCESSING SPECIAL SESSIONS, 2004, : 9 - 12
  • [28] Learning Video Object Segmentation with Visual Memory
    Tokmakov, Pavel
    Inria, Karteek Alahari
    Schmid, Cordelia
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 4491 - 4500
  • [29] Adaptive Memory Management for Video Object Segmentation
    Pourganjalikhan, Ali
    Poullis, Charalambos
    2022 19TH CONFERENCE ON ROBOTS AND VISION (CRV 2022), 2022, : 75 - 82
  • [30] Modulated Memory Network for Video Object Segmentation
    Lu, Hannan
    Guo, Zixian
    Zuo, Wangmeng
    MATHEMATICS, 2024, 12 (06)