Temporally Efficient Gabor Transformer for Unsupervised Video Object Segmentation

被引:1
|
作者
Fan, Jiaqing [1 ]
Su, Tiankang [2 ]
Zhang, Kaihua [3 ]
Liu, Bo [4 ]
Liu, Qingshan [5 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing, Peoples R China
[2] Nanjing Univ Informat Sci & Technol, Sch Automat, Nanjing, Peoples R China
[3] Nanjing Univ Informat Sci & Technol, Sch Comp & Sci, Minist Educ, Engn Res Ctr Digital Forens, Nanjing, Peoples R China
[4] Walmart Global Tech, Sunnyvale, CA USA
[5] Nanjing Univ Posts & Telecommun, Sch Comp Sci, Nanjing, Peoples R China
关键词
Unsupervised video object segmentation; Gabor filtering; Video Transformer; Spatio-temporal information selection;
D O I
10.1145/3581783.3612017
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Spatial-temporal structural details of targets in video (e.g. varying edges, textures over time) are essential to accurate Unsupervised Video Object Segmentation (UVOS). The vanilla multi-head self-attention in the Transformer-based UVOS methods usually concentrates on learning the general low-frequency information (e.g. illumination, color), while neglecting the high-frequency texture details, leading to unsatisfying segmentation results. To address this issue, this paper presents a Temporally efficient Gabor Transformer (TGFormer) for UVOS. The TGFormer jointly models the spatial dependencies and temporal coherence intra- and inter-frames, which can fully capture the rich structural details for accurate UVOS. Concretely, we first propose an effective learnable Gabor filtering Transformer to mine the structural texture details of the object for accurate UVOS. Then, to adaptively store the redundant neighboring historical information, we present an efficient dynamic neighboring frame selection module to automatically choose the useful temporal information, which simultaneously relieves the blurry frame and reduces the computation burden. Finally, we make the UVOS model be a fully Transformer architecture, meanwhile aggregating the information from space, Gabor and time domains, yielding a strong representation with rich structure details. Extensive experiments on five mainstream UVOS benchmarks (DAVIS2016, FBMS, DAVSOD, ViSal, and MCL) demonstrate the superiority of the presented solution to sate-of-the-art methods.
引用
收藏
页码:3394 / 3402
页数:9
相关论文
共 50 条
  • [21] Dual Prototype Attention for Unsupervised Video Object Segmentation
    Cho, Suhwan
    Lee, Minhyeok
    Lee, Seunghoon
    Lee, Dogyoon
    Choi, Heeseung
    Kim, Ig-Jae
    Lee, Sangyoun
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 19238 - 19247
  • [22] Joint Attention Mechanism for Unsupervised Video Object Segmentation
    Yao, Rui
    Xu, Xin
    Zhou, Yong
    Zhao, Jiaqi
    Fang, Liang
    PATTERN RECOGNITION AND COMPUTER VISION, PT I, 2021, 13019 : 154 - 165
  • [23] Deep Transport Network for Unsupervised Video Object Segmentation
    Zhang, Kaihua
    Zhao, Zicheng
    Liu, Dong
    Liu, Qingshan
    Liu, Bo
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 8761 - 8770
  • [24] Mask Selection and Propagation for Unsupervised Video Object Segmentation
    Garg, Shubhika
    Goel, Vidit
    2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, : 1679 - 1689
  • [25] Unsupervised Video Object Segmentation for Deep Reinforcement Learning
    Goel, Vik
    Weng, Jameson
    Poupart, Pascal
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [26] Evaluating quality of motion for unsupervised video object segmentation
    CHENG Guanjun
    SONG Huihui
    Optoelectronics Letters, 2024, 20 (06) : 379 - 384
  • [27] VabCut: A Video Extension of GrabCut for Unsupervised Video Foreground Object Segmentation
    Poullot, Sebastien
    Satoh, Shin'Ichi
    PROCEEDINGS OF THE 2014 9TH INTERNATIONAL CONFERENCE ON COMPUTER VISION, THEORY AND APPLICATIONS (VISAPP 2014), VOL 2, 2014, : 362 - 371
  • [28] An efficient video object segmentation scheme
    Ong, EP
    Tye, BJ
    Lin, WS
    Etoh, M
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 3361 - 3364
  • [29] Siamese Network with Interactive Transformer for Video Object Segmentation
    Lan, Meng
    Zhang, Jing
    He, Fengxiang
    Zhang, Lefei
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 1228 - 1236
  • [30] Learning Unsupervised Video Object Segmentation through Visual Attention
    Wang, Wenguan
    Song, Hongmei
    Zhao, Shuyang
    Shen, Jianbing
    Zhao, Sanyuan
    Hoi, Steven C. H.
    Ling, Haibin
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3059 - 3069