Adaptive Sparse Memory Networks for Efficient and Robust Video Object Segmentation

被引:2
|
作者
Dang, Jisheng [1 ,2 ]
Zheng, Huicheng [1 ,2 ]
Xu, Xiaohao [3 ]
Wang, Longguang [4 ]
Hu, Qingyong [5 ]
Guo, Yulan [6 ]
机构
[1] Sun Yat Sen Univ, Sch Comp Sci & Engn, Key Lab Machine Intelligence & Adv Comp, Minist Educ, Guangzhou 510006, Peoples R China
[2] Sun Yat Sen Univ, Guangdong Key Lab Informat Secur Technol, Guangzhou 510006, Peoples R China
[3] Univ Michigan, Inst Robot, Ann Arbor, MI 48109 USA
[4] Natl Univ Def Technol, Coll Elect Sci & Technol, Changsha 410000, Peoples R China
[5] Univ Oxford, Dept Comp Sci, Oxford OX1 2JD, England
[6] Sun Yat Sen Univ, Sch Elect & Commun Engn, Shenzhen 518000, Peoples R China
基金
中国国家自然科学基金;
关键词
Adaptive sparse memory network (ASM); attentive local memory reader (ALMR); video object segmentation (VOS);
D O I
10.1109/TNNLS.2024.3357118
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, memory-based networks have achieved promising performance for video object segmentation (VOS). However, existing methods still suffer from unsatisfactory segmentation accuracy and inferior efficiency. The reasons are mainly twofold: 1) during memory construction, the inflexible memory storage mechanism results in a weak discriminative ability for similar appearances in complex scenarios, leading to video-level temporal redundancy, and 2) during memory reading, matching robustness and memory retrieval accuracy decrease as the number of video frames increases. To address these challenges, we propose an adaptive sparse memory network (ASM) that efficiently and effectively performs VOS by sparsely leveraging previous guidance while attending to key information. Specifically, we design an adaptive sparse memory constructor (ASMC) to adaptively memorize informative past frames according to dynamic temporal changes in video frames. Furthermore, we introduce an attentive local memory reader (ALMR) to quickly retrieve relevant information using a subset of memory, thereby reducing frame-level redundant computation and noise in a simpler and more convenient manner. To prevent key features from being discarded by the subset of memory, we further propose a novel attentive local feature aggregation (ALFA) module, which preserves useful cues by selectively aggregating discriminative spatial dependence from adjacent frames, thereby effectively increasing the receptive field of each memory frame. Extensive experiments demonstrate that our model achieves state-of-the-art performance with real-time speed on six popular VOS benchmarks. Furthermore, our ASM can be applied to existing memory-based methods as generic plugins to achieve significant performance improvements. More importantly, our method exhibits robustness in handling sparse videos with low frame rates.
引用
收藏
页码:3820 / 3833
页数:14
相关论文
共 50 条
  • [1] Robust and Efficient Memory Network for Video Object Segmentation
    Chen, Yadang
    Zhang, Dingwei
    Yang, Zhi-Xin
    Wu, Enhua
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1769 - 1774
  • [2] Temporo-Spatial Parallel Sparse Memory Networks for Efficient Video Object Segmentation
    Dang, Jisheng
    Zheng, Huicheng
    Wang, Bimei
    Wang, Longguang
    Guo, Yulan
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (11) : 17291 - 17304
  • [3] Video Object Segmentation with Dynamic Memory Networks and Adaptive Object Alignment
    Liang, Shuxian
    Shen, Xu
    Huang, Jianqiang
    Hua, Xian-Sheng
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 8045 - 8054
  • [4] Memory Aggregation Networks for Efficient Interactive Video Object Segmentation
    Miao, Jiaxu
    Wei, Yunchao
    Yang, Yi
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, : 10363 - 10372
  • [5] Boosting Video Object Segmentation via Robust and Efficient Memory Network
    Chen, Yadang
    Zhang, Dingwei
    Zheng, Yuhui
    Yang, Zhi-Xin
    Wu, Enhua
    Zhao, Haixing
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (05) : 3340 - 3352
  • [6] Adaptive Memory Management for Video Object Segmentation
    Pourganjalikhan, Ali
    Poullis, Charalambos
    2022 19TH CONFERENCE ON ROBOTS AND VISION (CRV 2022), 2022, : 75 - 82
  • [7] Towards Robust Video Object Segmentation with Adaptive Object Calibration
    Xu, Xiaohao
    Wang, Jinglu
    Ming, Xiang
    Lu, Yan
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 2709 - 2718
  • [8] SpVOS: Efficient Video Object Segmentation With Triple Sparse Convolution
    Lin, Weihao
    Chen, Tao
    Yu, Chong
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 5977 - 5991
  • [9] Efficient Regional Memory Network for Video Object Segmentation
    Xie, Haozhe
    Yao, Hongxun
    Zhou, Shangchen
    Zhang, Shengping
    Sun, Wenxiu
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 1286 - 1295
  • [10] Efficient and Robust Video Object Segmentation Through Isogenous Memory Sampling and Frame Relation Mining
    Dang, Jisheng
    Zheng, Huicheng
    Lai, Jinming
    Yan, Xu
    Guo, Yulan
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 3924 - 3938