Temporally Efficient Gabor Transformer for Unsupervised Video Object Segmentation

被引:1
|
作者
Fan, Jiaqing [1 ]
Su, Tiankang [2 ]
Zhang, Kaihua [3 ]
Liu, Bo [4 ]
Liu, Qingshan [5 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing, Peoples R China
[2] Nanjing Univ Informat Sci & Technol, Sch Automat, Nanjing, Peoples R China
[3] Nanjing Univ Informat Sci & Technol, Sch Comp & Sci, Minist Educ, Engn Res Ctr Digital Forens, Nanjing, Peoples R China
[4] Walmart Global Tech, Sunnyvale, CA USA
[5] Nanjing Univ Posts & Telecommun, Sch Comp Sci, Nanjing, Peoples R China
关键词
Unsupervised video object segmentation; Gabor filtering; Video Transformer; Spatio-temporal information selection;
D O I
10.1145/3581783.3612017
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Spatial-temporal structural details of targets in video (e.g. varying edges, textures over time) are essential to accurate Unsupervised Video Object Segmentation (UVOS). The vanilla multi-head self-attention in the Transformer-based UVOS methods usually concentrates on learning the general low-frequency information (e.g. illumination, color), while neglecting the high-frequency texture details, leading to unsatisfying segmentation results. To address this issue, this paper presents a Temporally efficient Gabor Transformer (TGFormer) for UVOS. The TGFormer jointly models the spatial dependencies and temporal coherence intra- and inter-frames, which can fully capture the rich structural details for accurate UVOS. Concretely, we first propose an effective learnable Gabor filtering Transformer to mine the structural texture details of the object for accurate UVOS. Then, to adaptively store the redundant neighboring historical information, we present an efficient dynamic neighboring frame selection module to automatically choose the useful temporal information, which simultaneously relieves the blurry frame and reduces the computation burden. Finally, we make the UVOS model be a fully Transformer architecture, meanwhile aggregating the information from space, Gabor and time domains, yielding a strong representation with rich structure details. Extensive experiments on five mainstream UVOS benchmarks (DAVIS2016, FBMS, DAVSOD, ViSal, and MCL) demonstrate the superiority of the presented solution to sate-of-the-art methods.
引用
收藏
页码:3394 / 3402
页数:9
相关论文
共 50 条
  • [41] Unsupervised video object segmentation: an affinity and edge learning approach
    Muthu, Sundaram
    Tennakoon, Ruwan
    Hoseinnezhad, Reza
    Bab-Hadiashar, Alireza
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2022, 13 (11) : 3589 - 3605
  • [42] TSANET: TEMPORAL AND SCALE ALIGNMENT FOR UNSUPERVISED VIDEO OBJECT SEGMENTATION
    Lee, Seunghoon
    Cho, Suhwan
    Lee, Dogyoon
    Lee, Minhyeok
    Lee, Sangyoun
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 1535 - 1539
  • [43] A Simple and Powerful Global Optimization for Unsupervised Video Object Segmentation
    Ponimatkin, Georgy
    Samet, Nermin
    Xiao, Yang
    Du, Yuming
    Marlet, Renaud
    Lepetit, Vincent
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 5881 - 5892
  • [44] Hierarchical Feature Alignment Network for Unsupervised Video Object Segmentation
    Pei, Gensheng
    Shen, Fumin
    Yao, Yazhou
    Xie, Guo-Sen
    Tang, Zhenmin
    Tang, Jinhui
    COMPUTER VISION, ECCV 2022, PT XXXIV, 2022, 13694 : 596 - 613
  • [45] Unsupervised video object segmentation using conditional random fields
    Asma Hamza Bhatti
    Anis Ur Rahman
    Asad Anwar Butt
    Signal, Image and Video Processing, 2019, 13 : 9 - 16
  • [46] Unsupervised video object segmentation: an affinity and edge learning approach
    Sundaram Muthu
    Ruwan Tennakoon
    Reza Hoseinnezhad
    Alireza Bab-Hadiashar
    International Journal of Machine Learning and Cybernetics, 2022, 13 : 3589 - 3605
  • [47] Structural Transformer with Region Strip Attention for Video Object Segmentation
    Guan, Qingfeng
    Fang, Hao
    Han, Chenchen
    Wang, Zhicheng
    Zhang, Ruiheng
    Zhang, Yitian
    Lu, Xiankai
    NEUROCOMPUTING, 2024, 596
  • [48] Video Object Segmentation through Spatially Accurate and Temporally Dense Extraction of Primary Object Regions
    Zhang, Dong
    Javed, Omar
    Shah, Mubarak
    2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, : 628 - 635
  • [49] A feature point based scheme for unsupervised video object segmentation in stereoscopic video sequences
    Ntalianis, KS
    Doulamis, ND
    Doulamis, AD
    Kollias, SD
    2000 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, PROCEEDINGS VOLS I-III, 2000, : 1543 - 1546
  • [50] Online Unsupervised Video Object Segmentation via Contrastive Motion Clustering
    Xi, Lin
    Chen, Weihai
    Wu, Xingming
    Liu, Zhong
    Li, Zhengguo
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (02) : 995 - 1006