SGDViT: Saliency-Guided Dynamic Vision Transformer for UAV Tracking

被引:11
|
作者
Yao, Liangliang [1 ]
Fu, Changhong [1 ]
Li, Sihang [1 ]
Zheng, Guangze [2 ]
Ye, Junjie [1 ]
机构
[1] Tongji Univ, Sch Mech Engn, Shanghai 201804, Peoples R China
[2] Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China
基金
上海市自然科学基金; 中国国家自然科学基金;
关键词
D O I
10.1109/ICRA48891.2023.10161487
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Vision-based object tracking has boosted extensive autonomous applications for unmanned aerial vehicles (UAVs). However, the dynamic changes in flight maneuver and viewpoint encountered in UAV tracking pose significant difficulties, e.g., aspect ratio change, and scale variation. The conventional cross-correlation operation, while commonly used, has limitations in effectively capturing perceptual similarity and incorporates extraneous background information. To mitigate these limitations, this work presents a novel saliency-guided dynamic vision Transformer (SGDViT) for UAV tracking. The proposed method designs a new task-specific object saliency mining network to refine the cross-correlation operation and effectively discriminate foreground and background information. Additionally, a saliency adaptation embedding operation dynamically generates tokens based on initial saliency, thereby reducing the computational complexity of the Transformer architecture. Finally, a lightweight saliency filtering Transformer further refines saliency information and increases the focus on appearance information. The efficacy and robustness of the proposed approach have been thoroughly assessed through experiments on three widely-used UAV tracking benchmarks and real-world scenarios, with results demonstrating its superiority. The source code and demo videos are available at https://github.com/vision4robotics/SGDViT.
引用
收藏
页码:3353 / 3359
页数:7
相关论文
共 50 条
  • [31] Saliency-Guided Object Candidates Based on Gestalt Principles
    Werner, Thomas
    Martin-Garcia, German
    Frintrop, Simone
    COMPUTER VISION SYSTEMS (ICVS 2015), 2015, 9163 : 34 - 44
  • [32] Saliency-guided feature learning in a probabilistic categorization task
    Chenkov, N.
    Nelson, J-D
    PERCEPTION, 2010, 39 : 55 - 56
  • [33] Saliency-Guided Unsupervised Feature Learning for Scene Classification
    Zhang, Fan
    Du, Bo
    Zhang, Liangpei
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2015, 53 (04): : 2175 - 2184
  • [34] Saliency-Guided Quality Assessment of Screen Content Images
    Gu, Ke
    Wang, Shiqi
    Yang, Huan
    Lin, Weisi
    Zhai, Guangtao
    Yang, Xiaokang
    Zhang, Wenjun
    IEEE TRANSACTIONS ON MULTIMEDIA, 2016, 18 (06) : 1098 - 1110
  • [35] Tracking With Saliency Region Transformer
    Liu, Tianpeng
    Li, Jing
    Wu, Jia
    Zhang, Lefei
    Chang, Jun
    Wan, Jun
    Lian, Lezhi
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 285 - 296
  • [36] Saliency-Guided Remote Sensing Image Super-Resolution
    Liu, Baodi
    Zhao, Lifei
    Li, Jiaoyue
    Zhao, Hengle
    Liu, Weifeng
    Li, Ye
    Wang, Yanjiang
    Chen, Honglong
    Cao, Weijia
    REMOTE SENSING, 2021, 13 (24)
  • [37] Saliency-Guided Stereo Camera Control for Comfortable VR Explorations
    Yoon, Yeo-Jin
    No, Jaechun
    Choi, Soo-Mi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2017, E100D (09): : 2245 - 2248
  • [38] Saliency-guided neural prosthesis for visual attention: Design and simulation
    Yoshida, Masatoshi
    Veale, Richard
    NEUROSCIENCE RESEARCH, 2014, 78 : 90 - 94
  • [39] Saliency-guided level set model for automatic object segmentation
    Cai, Qing
    Liu, Huiying
    Qian, Yiming
    Zhou, Sanping
    Duan, Xiaojun
    Yang, Yee-Hong
    PATTERN RECOGNITION, 2019, 93 : 147 - 163
  • [40] Saliency-guided convolution neural network-transformer fusion network for no-reference image quality assessment
    Wu, Lipeng
    Cui, Ziguan
    Gan, Zongliang
    Tang, Guijin
    Liu, Feng
    JOURNAL OF ELECTRONIC IMAGING, 2024, 33 (06)