CenterFormer: Center-Based Transformer for 3D Object Detection

被引:61
|
作者
Zhou, Zixiang [1 ,2 ]
Zhao, Xiangchen [1 ]
Wang, Yu [1 ]
Wang, Panqu [1 ]
Foroosh, Hassan [2 ]
机构
[1] TuSimple, San Diego, CA 92122 USA
[2] Univ Cent Florida, Computat Imaging Lab, Orlando, FL 32816 USA
来源
关键词
LiDAR point cloud; 3D object detection; Transformer; Multi-frame fusion;
D O I
10.1007/978-3-031-19839-7_29
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Query-based transformer has shown great potential in constructing long-range attention in many image-domain tasks, but has rarely been considered in LiDAR-based 3D object detection due to the overwhelming size of the point cloud data. In this paper, we propose CenterFormer, a center-based transformer network for 3D object detection. CenterFormer first uses a center heatmap to select center candidates on top of a standard voxel-based point cloud encoder. It then uses the feature of the center candidate as the query embedding in the transformer. To further aggregate features from multiple frames, we design an approach to fuse features through cross-attention. Lastly, regression heads are added to predict the bounding box on the output center feature representation. Our design reduces the convergence difficulty and computational complexity of the transformer structure. The results show significant improvements over the strong baseline of anchor-free object detection networks. CenterFormer achieves state-of-the-art performance for a single model on the Waymo Open Dataset, with 73.7% mAPH on the validation set and 75.6% mAPH on the test set, significantly outperforming all previously published CNN and transformer-based methods. Our code is publicly available at https://github.com/TuSimple/centerformer
引用
收藏
页码:496 / 513
页数:18
相关论文
共 50 条
  • [41] Cross Modal Transformer: Towards Fast and Robust 3D Object Detection
    Yan, Junjie
    Liu, Yingfei
    Sun, Jianjian
    Jia, Fan
    Li, Shuailin
    Wang, Tiancai
    Zhang, Xiangyu
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 18222 - 18232
  • [42] SWFormer: Sparse Window Transformer for 3D Object Detection in Point Clouds
    Sun, Pei
    Tan, Mingxing
    Wang, Weiyue
    Liu, Chenxi
    Xia, Fei
    Leng, Zhaoqi
    Anguelov, Dragomir
    COMPUTER VISION, ECCV 2022, PT X, 2022, 13670 : 426 - 442
  • [43] MonoDETR: Depth-guided Transformer for Monocular 3D Object Detection
    Zhang, Renrui
    Qiu, Han
    Wang, Tai
    Guo, Ziyu
    Cui, Ziteng
    Qiao, Yu
    Li, Hongsheng
    Gao, Peng
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 9121 - 9132
  • [44] MonoATT: Online Monocular 3D Object Detection with Adaptive Token Transformer
    Zhou, Yunsong
    Zhu, Hongzi
    Liu, Quan
    Chang, Shan
    Guo, Minyi
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 17493 - 17503
  • [45] PolarFormer: Multi-Camera 3D Object Detection with Polar Transformer
    Jiang, Yanqin
    Zhang, Li
    Miao, Zhenwei
    Zhu, Xiatian
    Gao, Jin
    Hu, Weimin
    Jiang, Yu-Gang
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 1, 2023, : 1042 - 1050
  • [46] MsSVT plus plus : Mixed-Scale Sparse Voxel Transformer With Center Voting for 3D Object Detection
    Li, Jianan
    Dong, Shaocong
    Ding, Lihe
    Xu, Tingfa
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (05) : 3736 - 3752
  • [47] Transformer-Based Stereo-Aware 3D Object Detection From Binocular Images
    Sun, Hanqing
    Pang, Yanwei
    Cao, Jiale
    Xie, Jin
    Li, Xuelong
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (12) : 19675 - 19687
  • [48] 3D Object Detection based on Geometrical Segmentation
    Teng, Zhou
    Xiao, Jing
    2013 INTERNATIONAL CONFERENCE ON COMPUTER AND ROBOT VISION (CRV), 2013, : 67 - 74
  • [49] 3D Object Detection Based on LiDAR Data
    Sahba, Ramin
    Sahba, Amin
    Jamshidi, Mo
    Rad, Paul
    2019 IEEE 10TH ANNUAL UBIQUITOUS COMPUTING, ELECTRONICS & MOBILE COMMUNICATION CONFERENCE (UEMCON), 2019, : 511 - 514
  • [50] Model-based 3D object detection
    Biegelbauer, Georg
    Vincze, Markus
    Wohlkinger, Walter
    MACHINE VISION AND APPLICATIONS, 2010, 21 (04) : 497 - 516