Time3D: End-to-End Joint Monocular 3D Object Detection and Tracking for Autonomous Driving

被引:29
|
作者
Li, Peixuan [1 ]
Jin, Jieyu [1 ]
机构
[1] SAIC PP CEM, Shanghai, Peoples R China
关键词
D O I
10.1109/CVPR52688.2022.00386
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While separately leveraging monocular 3D object detection and 2D multi-object tracking can be straightforwardly applied to sequence images in a frame-by-frame fashion, stand-alone tracker cuts off the transmission of the uncertainty from the 3D detector to tracking while cannot pass tracking error differentials back to the 3D detector. In this work, we propose jointly training 3D detection and 3D tracking from only monocular videos in an end-to-end manner. The key component is a novel spatial-temporal information flow module that aggregates geometric and appearance features to predict robust similarity scores across all objects in current and past frames. Specifically, we leverage the attention mechanism of the transformer, in which self-attention aggregates the spatial information in a specific frame, and cross-attention exploits relation and affinities of all objects in the temporal domain of sequence frames. The affinities are then supervised to estimate the trajectory and guide the flow of information between corresponding 3D objects. In addition, we propose a temporal -consistency loss that explicitly involves 3D target motion modeling into the learning, making the 3D trajectory smooth in the world coordinate system. Time3D achieves 21.4% AMOTA, 13.6% AMOTP on the nuScenes 3D tracking benchmark, surpassing all published competitors, and running at 38 FPS, while Time3D achieves 31.2% mAP, 39.4% NDS on the nuScenes 3D detection benchmark.
引用
收藏
页码:3875 / 3884
页数:10
相关论文
共 50 条
  • [21] Monocular 3D object detection via estimation of paired keypoints for autonomous driving
    Chaofeng Ji
    Guizhong Liu
    Dan Zhao
    Multimedia Tools and Applications, 2022, 81 : 5973 - 5988
  • [22] OneTrack: Demystifying the Conflict Between Detection and Tracking in End-to-End 3D Trackers
    Wang, Qitai
    He, Jiawei
    Chen, Yuntao
    Zhang, Zhaoxiang
    COMPUTER VISION-ECCV 2024, PT VII, 2025, 15065 : 387 - 404
  • [23] Accurate Monocular 3D Object Detection via Color-Embedded 3D Reconstruction for Autonomous Driving
    Ma, Xinzhu
    Wang, Zhihui
    Li, Haojie
    Zhang, Pengbo
    Ouyang, Wanli
    Fan, Xin
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6850 - 6859
  • [24] An end-to-end framework for unconstrained monocular 3D hand pose estimation
    Sharma, Sanjeev
    Huang, Shaoli
    PATTERN RECOGNITION, 2021, 115
  • [25] Joint Monocular 3D Vehicle Detection and Tracking
    Hu, Hou-Ning
    Cai, Qi-Zhi
    Wang, Dequan
    Lin, Ji
    Sun, Min
    Krahenbuhl, Philipp
    Darrell, Trevor
    Yu, Fisher
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 5389 - 5398
  • [26] A survey on 3D object detection in real time for autonomous driving
    Contreras, Marcelo
    Jain, Aayush
    Bhatt, Neel P.
    Banerjee, Arunava
    Hashemi, Ehsan
    FRONTIERS IN ROBOTICS AND AI, 2024, 11
  • [27] MSL3D: 3D object detection from monocular, stereo and point cloud for autonomous driving
    Chen, Wenyu
    Li, Peixuan
    Zhao, Huaici
    NEUROCOMPUTING, 2022, 494 : 23 - 32
  • [28] Rope3D: The Roadside Perception Dataset for Autonomous Driving and Monocular 3D Object Detection Task
    Ye, Xiaoqing
    Shu, Mao
    Li, Hanyu
    Shi, Yifeng
    Li, Yingying
    Wang, Guangjie
    Tan, Xiao
    Ding, Errui
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 21309 - 21318
  • [29] PointTrackNet: An End-to-End Network For 3-D Object Detection and Tracking From Point Clouds
    Wang, Sukai
    Sun, Yuxiang
    Liu, Chengju
    Liu, Ming
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2020, 5 (02) : 3206 - 3212
  • [30] 3D Object Detection for Autonomous Driving: A Survey
    Qian, Rui
    Lai, Xin
    Li, Xirong
    PATTERN RECOGNITION, 2022, 130