TM2B: Transformer-Based Motion-to-Box Network for 3D Single Object Tracking on Point Clouds

被引:0
|
作者
Xu, Anqi [1 ]
Nie, Jiahao [1 ]
He, Zhiwei [1 ]
Lv, Xudong [1 ]
机构
[1] Sch Hangzhou Dianzi Univ, Hangzhou 310018, Peoples R China
来源
IEEE ROBOTICS AND AUTOMATION LETTERS | 2024年 / 9卷 / 08期
关键词
Transformers; Accuracy; Three-dimensional displays; Target tracking; Object tracking; Feature extraction; Point cloud compression; 3D single object tracking; motion-to-box; transformer;
D O I
10.1109/LRA.2024.3418274
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
3D single object tracking plays a crucial role in numerous applications such as autonomous driving. Recent trackers based on motion-centric paradigm perform well as they exploit motion cues to infer target relative motion across successive frames, which effectively overcome significant appearance variations of targets and distractors caused by occlusion. However, such a motion-centric paradigm tends to require multi-stage motion-to-box to refine the motion cues, which suffers from tedious hyper-parameter tuning and elaborate subtask designs. In this letter, we propose a novel transformer-based motion-to-box network (TM2B), which employs a learnable relation modeling transformer (LRMT) to generate accurate motion cues without multi-stage refinements. Our proposed LRMT contains two novel attention mechanisms: hierarchical interactive attention and learnable query attention. The former attention builds a learnable number-fixed sampling sets for each query on multi-scale feature maps, enabling each query to adaptively select prominent sampling elements, thus effectively encoding multi-scale features in a lightweight manner, while the latter calculates the weighted sum of the encoded features with learnable global query, enabling to extract valuable motion cues from all available features, thereby achieving accurate object tracking. Extensive experiments demonstrate that TM2B achieves state-of-the-art performance on KITTI, NuScenes and Waymo Open Dataset, while obtaining a significant improvement in inference speed over previous leading methods, achieving 56.8 FPS on a single NVIDIA 1080Ti GPU. The code is available at TM2B.
引用
收藏
页码:7078 / 7085
页数:8
相关论文
共 50 条
  • [31] A Hierarchical Graph Network for 3D Object Detection on Point Clouds
    Chen, Jintai
    Lei, Biwen
    Song, Qingyu
    Ying, Haochao
    Chen, Danny Z.
    Wu, Jian
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 389 - 398
  • [32] Graph Neural Network and Spatiotemporal Transformer Attention for 3D Video Object Detection From Point Clouds
    Yin, Junbo
    Shen, Jianbing
    Gao, Xin
    Crandall, David J.
    Yang, Ruigang
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (08) : 9822 - 9835
  • [33] Self Supervised Learning for Multiple Object Tracking in 3D Point Clouds
    Kumar, Aakash
    Kini, Jyoti
    Mian, Ajmal
    Shah, Mubarak
    2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2022, : 3754 - 3761
  • [34] Real-Time 3D Single Object Tracking With Transformer
    Shan, Jiayao
    Zhou, Sifan
    Cui, Yubo
    Fang, Zheng
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 2339 - 2353
  • [35] Facilitating 3D Object Tracking in Point Clouds with Image Semantics and Geometry
    Wang, Lingpeng
    Hui, Le
    Xie, Jin
    PATTERN RECOGNITION AND COMPUTER VISION, PT I, 2021, 13019 : 589 - 601
  • [36] U-shaped network based on Transformer for 3D point clouds semantic segmentation
    Zhang, Jiazhe
    Li, Xingwei
    Zhao, Xianfa
    Ge, Yizhi
    Zhang, Zheng
    2021 THE 5TH INTERNATIONAL CONFERENCE ON VIDEO AND IMAGE PROCESSING, ICVIP 2021, 2021, : 170 - 176
  • [37] Modeling Continuous Motion for 3D Point Cloud Object Tracking
    Luo, Zhipeng
    Zhang, Gongjie
    Zhou, Changqing
    Wu, Zhonghua
    Tao, Qingyi
    Lu, Lewei
    Lu, Shijian
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 5, 2024, : 4026 - 4034
  • [38] Efficient Transformer-based 3D Object Detection with Dynamic Token Halting
    Ye, Mao
    Meyer, Gregory P.
    Chai, Yuning
    Liu, Qiang
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 8404 - 8416
  • [39] Towards Efficient 3D Human Motion Prediction using Deformable Transformer-based Adversarial Network
    Yu Hua
    Fan Xuanzhe
    Hou Yaqing
    Liu Yi
    Kang Cai
    Zhou Dongsheng
    Zhang Qiang
    2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2022), 2022,
  • [40] CMT: Context-Matching-Guided Transformer for 3D Tracking in Point Clouds
    Guo, Zhiyang
    Mao, Yunyao
    Zhou, Wengang
    Wang, Min
    Li, Houqiang
    COMPUTER VISION, ECCV 2022, PT XXII, 2022, 13682 : 95 - 111