MVX-Net: Multimodal VoxelNet for 3D Object Detection

被引:0
|
作者
Sindagi, Vishwanath A. [1 ]
Zhou, Yin [2 ]
Tuzel, Oncel [2 ]
机构
[1] Johns Hopkins Univ, Dept Elect & Comp Engn, Baltimore, MD 21218 USA
[2] Apple Inc, AI Res, Cupertino, CA 95014 USA
关键词
REPRESENTATION;
D O I
10.1109/icra.2019.8794195
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Many recent works on 3D object detection have focused on designing neural network architectures that can consume point cloud data. While these approaches demonstrate encouraging performance, they are typically based on a single modality and are unable to leverage information from other modalities, such as a camera. Although a few approaches fuse data from different modalities, these methods either use a complicated pipeline to process the modalities sequentially, or perform late-fusion and are unable to learn interaction between different modalities at early stages. In this work, we present PointFusion and VoxelFusion: two simple yet effective early-fusion approaches to combine the RGB and point cloud modalities, by leveraging the recently introduced VoxelNet architecture. Evaluation on the KITTI dataset demonstrates significant improvements in performance over approaches which only use point cloud data. Furthermore, the proposed method provides results competitive with the state-of-the-art multimodal algorithms, achieving top-2 ranking in five of the six birds eye view and 3D detection categories on the KITTI benchmark, by using a simple single stage network.
引用
收藏
页码:7276 / 7282
页数:7
相关论文
共 50 条
  • [1] Semantic Frustum Based VoxelNet for 3D Object Detection
    Chen, Feng
    Wu, Fei
    Huang, Qinghua
    Feng, Yujian
    Ge, Qi
    Ji, Yimu
    Hu, Chang-Hui
    Jing, Xiao-Yuan
    2020 CHINESE AUTOMATION CONGRESS (CAC 2020), 2020, : 7629 - 7634
  • [2] Method of 3D vehicle object detection based on improved VoxelNet
    Zhao, Yi-Fan
    Wu, Shao-Bo
    Dong, Shi-Peng
    Journal of Computers (Taiwan), 2021, 32 (01) : 242 - 255
  • [3] VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking
    Chen, Yukang
    Liu, Jianhui
    Zhang, Xiangyu
    Qi, Xiaojuan
    Jia, Jiaya
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 21674 - 21683
  • [4] MFF-Net: Multimodal Feature Fusion Network for 3D Object Detection
    Shi, Peicheng
    Liu, Zhiqiang
    Qi, Heng
    Yang, Aixi
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 75 (03): : 5615 - 5637
  • [5] Multimodal Object Query Initialization for 3D Object Detection
    van Geerenstein, Mathijs R.
    Ruppel, Felicia
    Dietmayers, Klaus
    Gavrila, Dariu M.
    2024 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2024), 2024, : 12484 - 12491
  • [6] VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection
    Zhou, Yin
    Tuzel, Oncel
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 4490 - 4499
  • [7] Multimodal 3D Histogram for Moving Object Detection
    Mukherjee, Dibyendu
    Saha, Ashirbani
    Wu, Q. M. Jonathan
    Jiang, Wei
    2014 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), 2014, : 2397 - 2402
  • [8] PPF-Net: Efficient Multimodal 3D Object Detection with Pillar-Point Fusion
    Zhang, Lingxiao
    Li, Changyong
    ELECTRONICS, 2025, 14 (04):
  • [9] Virtual Sparse Convolution for Multimodal 3D Object Detection
    Wu, Hai
    Wen, Chenglu
    Shi, Shaoshuai
    Li, Xin
    Wang, Cheng
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 21653 - 21662
  • [10] Multimodal 3D Object Detection from Simulated Pretraining
    Brekke, Asmund
    Vatsendvik, Fredrik
    Lindseth, Frank
    NORDIC ARTIFICIAL INTELLIGENCE RESEARCH AND DEVELOPMENT, 2019, 1056 : 102 - 113