TM2B: Transformer-Based Motion-to-Box Network for 3D Single Object Tracking on Point Clouds

被引:0
|
作者
Xu, Anqi [1 ]
Nie, Jiahao [1 ]
He, Zhiwei [1 ]
Lv, Xudong [1 ]
机构
[1] Sch Hangzhou Dianzi Univ, Hangzhou 310018, Peoples R China
来源
IEEE ROBOTICS AND AUTOMATION LETTERS | 2024年 / 9卷 / 08期
关键词
Transformers; Accuracy; Three-dimensional displays; Target tracking; Object tracking; Feature extraction; Point cloud compression; 3D single object tracking; motion-to-box; transformer;
D O I
10.1109/LRA.2024.3418274
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
3D single object tracking plays a crucial role in numerous applications such as autonomous driving. Recent trackers based on motion-centric paradigm perform well as they exploit motion cues to infer target relative motion across successive frames, which effectively overcome significant appearance variations of targets and distractors caused by occlusion. However, such a motion-centric paradigm tends to require multi-stage motion-to-box to refine the motion cues, which suffers from tedious hyper-parameter tuning and elaborate subtask designs. In this letter, we propose a novel transformer-based motion-to-box network (TM2B), which employs a learnable relation modeling transformer (LRMT) to generate accurate motion cues without multi-stage refinements. Our proposed LRMT contains two novel attention mechanisms: hierarchical interactive attention and learnable query attention. The former attention builds a learnable number-fixed sampling sets for each query on multi-scale feature maps, enabling each query to adaptively select prominent sampling elements, thus effectively encoding multi-scale features in a lightweight manner, while the latter calculates the weighted sum of the encoded features with learnable global query, enabling to extract valuable motion cues from all available features, thereby achieving accurate object tracking. Extensive experiments demonstrate that TM2B achieves state-of-the-art performance on KITTI, NuScenes and Waymo Open Dataset, while obtaining a significant improvement in inference speed over previous leading methods, achieving 56.8 FPS on a single NVIDIA 1080Ti GPU. The code is available at TM2B.
引用
收藏
页码:7078 / 7085
页数:8
相关论文
共 50 条
  • [1] 3D Siamese Transformer Network for Single Object Tracking on Point Clouds
    Hui, Le
    Wang, Lingpeng
    Tang, Linghua
    Lan, Kaihao
    Xie, Jin
    Yang, Jian
    COMPUTER VISION - ECCV 2022, PT II, 2022, 13662 : 293 - 310
  • [2] P2B: Point-to-Box Network for 3D Object Tracking in Point Clouds
    Qi, Haozhe
    Feng, Chen
    Cao, Zhiguo
    Zhao, Feng
    Xiao, Yang
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 6328 - 6337
  • [3] Point Transformer-Based Salient Object Detection Network for 3-D Measurement Point Clouds
    Wei, Zeyong
    Chen, Baian
    Wang, Weiming
    Chen, Honghua
    Wei, Mingqiang
    Li, Jonathan
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 11
  • [4] PTT: Point-Track-Transformer Module for 3D Single Object Tracking in Point Clouds
    Shan, Jiayao
    Zhou, Sifan
    Fang, Zheng
    Cui, Yubo
    2021 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2021, : 1310 - 1316
  • [5] Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds
    Zheng, Chaoda
    Yan, Xu
    Zhang, Haiming
    Wang, Baoyuan
    Cheng, Shenghui
    Cui, Shuguang
    Li, Zhen
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 8101 - 8110
  • [6] An Effective Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds
    Zheng, Chaoda
    Yan, Xu
    Zhang, Haiming
    Wang, Baoyuan
    Cheng, Shenghui
    Cui, Shuguang
    Li, Zhen
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (01) : 43 - 60
  • [7] AnchorPoint: Query Design for Transformer-Based 3D Object Detection and Tracking
    Liu, Hao
    Ma, Yanni
    Wang, Hanyun
    Zhang, Chaobo
    Guo, Yulan
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2023, 24 (10) : 10988 - 11000
  • [8] SPAN: siampillars attention network for 3D object tracking in point clouds
    Zhuang, Yi
    Zhao, Haitao
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2022, 13 (08) : 2105 - 2117
  • [9] SPAN: siampillars attention network for 3D object tracking in point clouds
    Yi Zhuang
    Haitao Zhao
    International Journal of Machine Learning and Cybernetics, 2022, 13 : 2105 - 2117
  • [10] DTSSD: Dual-Channel Transformer-Based Network for Point-Based 3D Object Detection
    Zheng, Zhijie
    Huang, Zhicong
    Zhao, Jingwen
    Hu, Haifeng
    Chen, Dihu
    IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 798 - 802