Temporally Efficient Gabor Transformer for Unsupervised Video Object Segmentation

被引:1
|
作者
Fan, Jiaqing [1 ]
Su, Tiankang [2 ]
Zhang, Kaihua [3 ]
Liu, Bo [4 ]
Liu, Qingshan [5 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing, Peoples R China
[2] Nanjing Univ Informat Sci & Technol, Sch Automat, Nanjing, Peoples R China
[3] Nanjing Univ Informat Sci & Technol, Sch Comp & Sci, Minist Educ, Engn Res Ctr Digital Forens, Nanjing, Peoples R China
[4] Walmart Global Tech, Sunnyvale, CA USA
[5] Nanjing Univ Posts & Telecommun, Sch Comp Sci, Nanjing, Peoples R China
关键词
Unsupervised video object segmentation; Gabor filtering; Video Transformer; Spatio-temporal information selection;
D O I
10.1145/3581783.3612017
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Spatial-temporal structural details of targets in video (e.g. varying edges, textures over time) are essential to accurate Unsupervised Video Object Segmentation (UVOS). The vanilla multi-head self-attention in the Transformer-based UVOS methods usually concentrates on learning the general low-frequency information (e.g. illumination, color), while neglecting the high-frequency texture details, leading to unsatisfying segmentation results. To address this issue, this paper presents a Temporally efficient Gabor Transformer (TGFormer) for UVOS. The TGFormer jointly models the spatial dependencies and temporal coherence intra- and inter-frames, which can fully capture the rich structural details for accurate UVOS. Concretely, we first propose an effective learnable Gabor filtering Transformer to mine the structural texture details of the object for accurate UVOS. Then, to adaptively store the redundant neighboring historical information, we present an efficient dynamic neighboring frame selection module to automatically choose the useful temporal information, which simultaneously relieves the blurry frame and reduces the computation burden. Finally, we make the UVOS model be a fully Transformer architecture, meanwhile aggregating the information from space, Gabor and time domains, yielding a strong representation with rich structure details. Extensive experiments on five mainstream UVOS benchmarks (DAVIS2016, FBMS, DAVSOD, ViSal, and MCL) demonstrate the superiority of the presented solution to sate-of-the-art methods.
引用
收藏
页码:3394 / 3402
页数:9
相关论文
共 50 条
  • [1] Temporally Efficient Vision Transformer for Video Instance Segmentation
    Yang, Shusheng
    Wang, Xinggang
    Li, Yu
    Fang, Yuxin
    Fang, Jiemin
    Liu, Wenyu
    Zhao, Xun
    Shan, Ying
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 2875 - 2885
  • [2] Learning Motion Guidance for Efficient Unsupervised Video Object Segmentation
    Zhao Z.-C.
    Zhang K.-H.
    Fan J.-Q.
    Liu Q.-S.
    Zidonghua Xuebao/Acta Automatica Sinica, 2023, 49 (04): : 872 - 880
  • [3] Unsupervised video segmentation and object tracking
    Sista, S
    Kashyap, RL
    COMPUTERS IN INDUSTRY, 2000, 42 (2-3) : 127 - 146
  • [4] Unsupervised object segmentation in video by efficient selection of highly probable positive features
    Haller, Emanuela
    Leordeanu, Marius
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5095 - 5103
  • [5] Reciprocal Transformations for Unsupervised Video Object Segmentation
    Ren, Sucheng
    Liu, Wenxi
    Liu, Yongtuo
    Chen, Haoxin
    Han, Guoqiang
    He, Shengfeng
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 15450 - 15459
  • [6] Anchor Diffusion for Unsupervised Video Object Segmentation
    Yang, Zhao
    Wang, Qiang
    Bertinetto, Luca
    Hu, Weiming
    Bai, Song
    Torr, Philip H. S.
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 931 - 940
  • [7] Unsupervised Video Object Segmentation by Supertrajectory Labeling
    Masuda, Masahiro
    Mochizuki, Yoshihiko
    Ishikawa, Hiroshi
    PROCEEDINGS OF THE FIFTEENTH IAPR INTERNATIONAL CONFERENCE ON MACHINE VISION APPLICATIONS - MVA2017, 2017, : 448 - 451
  • [8] Unsupervised video object segmentation with mask transformer: boosting accuracy and efficiency through feature fusion
    Qu, Daikun
    Zhao, Hongwei
    Zhou, Mingzhu
    VISUAL COMPUTER, 2025,
  • [9] Efficient Long-Short Temporal Attention network for unsupervised Video Object Segmentation
    Li, Ping
    Zhang, Yu
    Yuan, Li
    Xiao, Huaxin
    Lin, Binbin
    Xu, Xianghua
    PATTERN RECOGNITION, 2024, 146
  • [10] Learning Depth Signal Guided Mixed Transformer for High-Performance Unsupervised Video Object Segmentation
    Su T.-K.
    Song H.-H.
    Fan J.-Q.
    Zhang K.-H.
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2023, 51 (05): : 1388 - 1395