Transformer for multiple object tracking: Exploring locality to vision

被引:7
|
作者
Wu, Shan [1 ]
Hadachi, Amnir [1 ]
Lu, Chaoru [2 ]
Vivet, Damien [3 ]
机构
[1] Univ Tartu, Inst Comp Sci, ITS Lab, Narva mnt 18, EE-51009 Tartu, Estonia
[2] Oslo Metropolitan Univ, Ctr Metropolitan Digitalizat & Smartizat MetSmart, Dept Built Environm, Pilestredet 46, N-0167 Oslo, Norway
[3] Univ Toulouse, ISAE SUPAERO, 10 Ave Edouard Belin, F-31400 Toulouse, France
关键词
Multi-object tracking; Transformer; Deep learning; Locality to vision;
D O I
10.1016/j.patrec.2023.04.016
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-object tracking (MOT) is a critical task in various domains, such as traffic analysis, surveillance, and autonomous vehicles. The joint-detection-and-tracking paradigm has been extensively researched, which is faster and more convenient for training and deploying over the classic tracking-by-detection paradigm while achieving state-of-the-art performance. This paper explores the possibilities of enhancing the MOT system by leveraging the prevailing convolutional neural network (CNN) and a novel vision transformer technique Locality. There are several deficiencies in the transformer adopted for computer vision tasks. While the transformers are good at modeling global information for a long embedding, the locality mech-anism, which learns the local features, is missing. This could lead to negligence of small objects, which may cause security issues. We combine the TransTrack MOT system with the locality mechanism in-spired by LocalViT and find that the locality-enhanced system outperforms the baseline TransTrack by 5.3% MOTA on the MOT17 dataset. (c) 2023 Elsevier B.V. All rights reserved.
引用
收藏
页码:70 / 76
页数:7
相关论文
共 50 条
  • [21] Improving Multiple Object Tracking with Single Object Tracking
    Zheng, Linyu
    Tang, Ming
    Chen, Yingying
    Zhu, Guibo
    Wang, Jinqiao
    Lu, Hanqing
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 2453 - 2462
  • [22] Robust Object Tracking via Locality Sensitive Histograms
    He, Shengfeng
    Lau, Rynson W. H.
    Yang, Qingxiong
    Wang, Jiang
    Yang, Ming-Hsuan
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2017, 27 (05) : 1006 - 1017
  • [23] Gaze behaviour during multiple object tracking is dependent on binocular vision integrity
    Zwierko, Teresa
    Redondo, Beatriz
    Jedziniak, Wojciech
    Molina, Ruben
    Jimenez, Raimundo
    Vera, Jesus
    OPHTHALMIC AND PHYSIOLOGICAL OPTICS, 2024, 44 (01) : 23 - 31
  • [24] A Combined Vision-Based Multiple Object Tracking and Visual Odometry System
    Aladem, Mohamed
    Rawashdeh, Samir A.
    IEEE SENSORS JOURNAL, 2019, 19 (23) : 11714 - 11720
  • [25] Improved multi object tracking with locality sensitive hashing
    Chemmanam, Ajai John
    Jose, Bijoy
    Moopan, Asif
    PATTERN ANALYSIS AND APPLICATIONS, 2024, 27 (04)
  • [26] Multiple Object Tracking in Drone Aerial Videos by a Holistic Transformer and Multiple Feature Trajectory Matching Pattern
    Yuan, Yubin
    Wu, Yiquan
    Zhao, Langyue
    Pang, Yaxuan
    Liu, Yuqi
    DRONES, 2024, 8 (08)
  • [27] Heterogeneous Graph Transformer for Multiple Tiny Object Tracking in RGB-T Videos
    Xu, Qingyu
    Wang, Longguang
    Sheng, Weidong
    Wang, Yingqian
    Xiao, Chao
    Ma, Chao
    An, Wei
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 9383 - 9397
  • [28] Tracking vision transformer with class and regression tokens
    Di Nardo, Emanuel
    Ciaramella, Angelo
    INFORMATION SCIENCES, 2023, 619 : 276 - 287
  • [29] A Lightweight Object Tracking Method Based on Transformer
    Sun, Ziwen
    Yang, Chuandong
    Ling, Chong
    PROCEEDINGS OF 2023 7TH INTERNATIONAL CONFERENCE ON ELECTRONIC INFORMATION TECHNOLOGY AND COMPUTER ENGINEERING, EITCE 2023, 2023, : 796 - 801
  • [30] Incorporating Locality into Vision Transformer Via Spectral Graph Convolutional Network
    Jin, Longbin
    Kim, Eun Yi
    SSRN, 2022,