Transformer for multiple object tracking: Exploring locality to vision

被引:7
|
作者
Wu, Shan [1 ]
Hadachi, Amnir [1 ]
Lu, Chaoru [2 ]
Vivet, Damien [3 ]
机构
[1] Univ Tartu, Inst Comp Sci, ITS Lab, Narva mnt 18, EE-51009 Tartu, Estonia
[2] Oslo Metropolitan Univ, Ctr Metropolitan Digitalizat & Smartizat MetSmart, Dept Built Environm, Pilestredet 46, N-0167 Oslo, Norway
[3] Univ Toulouse, ISAE SUPAERO, 10 Ave Edouard Belin, F-31400 Toulouse, France
关键词
Multi-object tracking; Transformer; Deep learning; Locality to vision;
D O I
10.1016/j.patrec.2023.04.016
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-object tracking (MOT) is a critical task in various domains, such as traffic analysis, surveillance, and autonomous vehicles. The joint-detection-and-tracking paradigm has been extensively researched, which is faster and more convenient for training and deploying over the classic tracking-by-detection paradigm while achieving state-of-the-art performance. This paper explores the possibilities of enhancing the MOT system by leveraging the prevailing convolutional neural network (CNN) and a novel vision transformer technique Locality. There are several deficiencies in the transformer adopted for computer vision tasks. While the transformers are good at modeling global information for a long embedding, the locality mech-anism, which learns the local features, is missing. This could lead to negligence of small objects, which may cause security issues. We combine the TransTrack MOT system with the locality mechanism in-spired by LocalViT and find that the locality-enhanced system outperforms the baseline TransTrack by 5.3% MOTA on the MOT17 dataset. (c) 2023 Elsevier B.V. All rights reserved.
引用
收藏
页码:70 / 76
页数:7
相关论文
共 50 条
  • [1] Exploring Plain Vision Transformer Backbones for Object Detection
    Li, Yanghao
    Mao, Hanzi
    Girshick, Ross
    He, Kaiming
    COMPUTER VISION, ECCV 2022, PT IX, 2022, 13669 : 280 - 296
  • [2] Multiple templates transformer for visual object tracking
    Pang, Haibo
    Su, Jie
    Ma, Rongqi
    Li, Tingting
    Liu, Chengming
    KNOWLEDGE-BASED SYSTEMS, 2023, 280
  • [3] Experiments And Discussions On Vision Transformer (ViT) Parameters For Object Tracking
    Fukushima, Daiki
    Ishikawa, Tomokazu
    2022 NICOGRAPH INTERNATIONAL, NICOINT 2022, 2022, : 64 - 68
  • [4] METFormer: A Motion Enhanced Transformer for Multiple Object Tracking
    Gao, Jianjun
    Yap, Kim-Hui
    Wang, Yi
    Garg, Kratika
    Han, Boon Siew
    2023 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS, 2023,
  • [5] Group Tracking: Exploring Mutual Relations for Multiple Object Tracking
    Duan, Genquan
    Ai, Haizhou
    Cao, Song
    Lao, Shihong
    COMPUTER VISION - ECCV 2012, PT III, 2012, 7574 : 129 - 143
  • [6] Object tracking using multiple neuromorphic vision sensors
    Becanovic, V
    Hosseiny, R
    Indiveri, G
    ROBOCUP 2004: ROBOT SOCCER WORLD CUP VIII, 2005, 3276 : 426 - 433
  • [7] To See or Not to See: Exploring the Differences Between Multiple Identity Tracking and Multiple Object Tracking
    Mcleod, Katie
    Soo, Leili
    Andersen, Soren K.
    PERCEPTION, 2017, 46 (02) : 233 - 233
  • [8] MIMTracking: Masked image modeling enhanced vision transformer for visual object tracking
    Zhang, Shuo
    Zhang, Dan
    Zou, Qi
    NEUROCOMPUTING, 2024, 606
  • [9] Computer Vision Based Object Tracking for Multiple Robot Collaboration
    Chikurtev, Denis
    Yovchev, Kaloyan
    ADVANCES IN SERVICE AND INDUSTRIAL ROBOTICS, RAAD 2022, 2022, 120 : 469 - 476
  • [10] Exploring reliable infrared object tracking with spatio-temporal fusion transformer
    Qi, Meibin
    Wang, Qinxin
    Zhuang, Shuo
    Zhang, Ke
    Li, Kunyuan
    Liu, Yimin
    Yang, Yanfang
    KNOWLEDGE-BASED SYSTEMS, 2024, 284