SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint Estimation

被引:230
|
作者
Liu, Zechen [1 ]
Wu, Zizhang [1 ]
Toth, Roland [2 ]
机构
[1] ZongMu Tech, Beijing, Peoples R China
[2] TU e, Beijing, Peoples R China
关键词
D O I
10.1109/CVPRW50498.2020.00506
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Estimating 3D orientation and translation of objects is essential for infrastructure-less autonomous navigation and driving. In case of monocular vision, successful methods have been mainly based on two ingredients: (i) a network generating 2D region proposals, (ii) a R-CNN structure predicting 3D object pose by utilizing the acquired regions of interest. We argue that the 2D detection network is redundant and introduces non-negligible noise for 3D detection. Hence, we propose a novel 3D object detection method, named SMOKE, in this paper that predicts a 3D bounding box for each detected object by combining a single keypoint estimate with regressed 3D variables. As a second contribution, we propose a multi-step disentangling approach for constructing the 3D bounding box, which significantly improves both training convergence and detection accuracy. In contrast to previous 3D detection techniques, our method does not require complicated pre/post-processing, extra data, and a refinement stage. Despite of its structural simplicity, our proposed SMOKE network outperforms all existing monocular 3D detection methods on the KITTI dataset, giving the best state-of-the-art result on both 3D object detection and Bird's eye view evaluation. The code is available at https://github.com/lzccccc/SMOKE.
引用
收藏
页码:4289 / 4298
页数:10
相关论文
共 50 条
  • [21] Single-Stage 6D Object Pose Estimation
    Hu, Yinlin
    Fua, Pascal
    Wang, Wei
    Salzmann, Mathieu
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 2927 - 2936
  • [22] Multi-feature enhancement based on sparse networks for single-stage 3D object detection
    Ke, Zunwang
    Lin, Chenyu
    Zhang, Tao
    Jia, Tingting
    Du, Minghua
    Wang, Gang
    Zhang, Yugui
    ALEXANDRIA ENGINEERING JOURNAL, 2025, 111 : 123 - 135
  • [23] Monocular 3D Object Detection for Autonomous Driving
    Chen, Xiaozhi
    Kundu, Kaustav
    Zhang, Ziyu
    Ma, Huimin
    Fidler, Sanja
    Urtasun, Raquel
    2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 2147 - 2156
  • [24] Dimension Embeddings for Monocular 3D Object Detection
    Zhang, Yunpeng
    Zheng, Wenzhao
    Zhu, Zheng
    Huang, Guan
    Du, Dalong
    Zhou, Jie
    Lu, Jiwen
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 1579 - 1588
  • [25] Pose Anchor: A Single-Stage Hand Keypoint Detection Network
    Li, Yuan
    Wang, Xinggang
    Liu, Wenyu
    Feng, Bin
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2020, 30 (07) : 2104 - 2113
  • [26] Multivariate Probabilistic Monocular 3D Object Detection
    Shi, Xuepeng
    Chen, Zhixiang
    Kim, Tae-Kyun
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 4270 - 4279
  • [27] Uncertainty Prediction for Monocular 3D Object Detection
    Mun, Junghwan
    Choi, Hyukdoo
    SENSORS, 2023, 23 (12)
  • [28] Monocular 3D object detection for distant objects
    Li, Jiahao
    Han, Xiaohong
    JOURNAL OF ELECTRONIC IMAGING, 2024, 33 (03) : 33021
  • [29] Homography Loss for Monocular 3D Object Detection
    Gu, Jiaqi
    Wu, Bojian
    Fan, Lubin
    Huang, Jianqiang
    Cao, Shen
    Xiang, Zhiyu
    Hua, Xian-Sheng
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 1070 - 1079
  • [30] MoGDE: Boosting Mobile Monocular 3D Object Detection with Ground Depth Estimation
    Zhou, Yunsong
    Liu, Quan
    Zhu, Hongzi
    Li, Yunzhe
    Chang, Shan
    Guo, Minyi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,