CenterFormer: Center-Based Transformer for 3D Object Detection

被引：61

作者：

Zhou, Zixiang ^{[1
,2
]}

Zhao, Xiangchen ^{[1
]}

Wang, Yu ^{[1
]}

Wang, Panqu ^{[1
]}

Foroosh, Hassan ^{[2
]}

机构：

[1] TuSimple, San Diego, CA 92122 USA

[2] Univ Cent Florida, Computat Imaging Lab, Orlando, FL 32816 USA

来源：

COMPUTER VISION, ECCV 2022, PT XXXVIII | 2022年 / 13698卷

关键词：

LiDAR point cloud; 3D object detection; Transformer; Multi-frame fusion;

D O I：

10.1007/978-3-031-19839-7_29

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Query-based transformer has shown great potential in constructing long-range attention in many image-domain tasks, but has rarely been considered in LiDAR-based 3D object detection due to the overwhelming size of the point cloud data. In this paper, we propose CenterFormer, a center-based transformer network for 3D object detection. CenterFormer first uses a center heatmap to select center candidates on top of a standard voxel-based point cloud encoder. It then uses the feature of the center candidate as the query embedding in the transformer. To further aggregate features from multiple frames, we design an approach to fuse features through cross-attention. Lastly, regression heads are added to predict the bounding box on the output center feature representation. Our design reduces the convergence difficulty and computational complexity of the transformer structure. The results show significant improvements over the strong baseline of anchor-free object detection networks. CenterFormer achieves state-of-the-art performance for a single model on the Waymo Open Dataset, with 73.7% mAPH on the validation set and 75.6% mAPH on the test set, significantly outperforming all previously published CNN and transformer-based methods. Our code is publicly available at https://github.com/TuSimple/centerformer

引用

页码：496 / 513

页数：18

共 50 条

[41] Cross Modal Transformer: Towards Fast and Robust 3D Object Detection
Yan, Junjie
Liu, Yingfei
Sun, Jianjian
Jia, Fan
Li, Shuailin
Wang, Tiancai
Zhang, Xiangyu
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 18222 - 18232
[42] SWFormer: Sparse Window Transformer for 3D Object Detection in Point Clouds
Sun, Pei
Tan, Mingxing
Wang, Weiyue
Liu, Chenxi
Xia, Fei
Leng, Zhaoqi
Anguelov, Dragomir
COMPUTER VISION, ECCV 2022, PT X, 2022, 13670 : 426 - 442
[43] MonoDETR: Depth-guided Transformer for Monocular 3D Object Detection
Zhang, Renrui
Qiu, Han
Wang, Tai
Guo, Ziyu
Cui, Ziteng
Qiao, Yu
Li, Hongsheng
Gao, Peng
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 9121 - 9132
[44] MonoATT: Online Monocular 3D Object Detection with Adaptive Token Transformer
Zhou, Yunsong
Zhu, Hongzi
Liu, Quan
Chang, Shan
Guo, Minyi
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 17493 - 17503
[45] PolarFormer: Multi-Camera 3D Object Detection with Polar Transformer
Jiang, Yanqin
Zhang, Li
Miao, Zhenwei
Zhu, Xiatian
Gao, Jin
Hu, Weimin
Jiang, Yu-Gang
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 1, 2023, : 1042 - 1050
[46] MsSVT plus plus : Mixed-Scale Sparse Voxel Transformer With Center Voting for 3D Object Detection
Li, Jianan
Dong, Shaocong
Ding, Lihe
Xu, Tingfa
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (05) : 3736 - 3752
[47] Transformer-Based Stereo-Aware 3D Object Detection From Binocular Images
Sun, Hanqing
Pang, Yanwei
Cao, Jiale
Xie, Jin
Li, Xuelong
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (12) : 19675 - 19687
[48] 3D Object Detection based on Geometrical Segmentation
Teng, Zhou
Xiao, Jing
2013 INTERNATIONAL CONFERENCE ON COMPUTER AND ROBOT VISION (CRV), 2013, : 67 - 74
[49] 3D Object Detection Based on LiDAR Data
Sahba, Ramin
Sahba, Amin
Jamshidi, Mo
Rad, Paul
2019 IEEE 10TH ANNUAL UBIQUITOUS COMPUTING, ELECTRONICS & MOBILE COMMUNICATION CONFERENCE (UEMCON), 2019, : 511 - 514
[50] Model-based 3D object detection
Biegelbauer, Georg
Vincze, Markus
Wohlkinger, Walter
MACHINE VISION AND APPLICATIONS, 2010, 21 (04) : 497 - 516

← 1 2 3 4 5 →