MVX-Net: Multimodal VoxelNet for 3D Object Detection

被引：0

作者：

Sindagi, Vishwanath A. ^{[1
]}

Zhou, Yin ^{[2
]}

Tuzel, Oncel ^{[2
]}

机构：

[1] Johns Hopkins Univ, Dept Elect & Comp Engn, Baltimore, MD 21218 USA

[2] Apple Inc, AI Res, Cupertino, CA 95014 USA

来源：

2019 INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA) | 2019年

关键词：

REPRESENTATION;

D O I：

10.1109/icra.2019.8794195

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Many recent works on 3D object detection have focused on designing neural network architectures that can consume point cloud data. While these approaches demonstrate encouraging performance, they are typically based on a single modality and are unable to leverage information from other modalities, such as a camera. Although a few approaches fuse data from different modalities, these methods either use a complicated pipeline to process the modalities sequentially, or perform late-fusion and are unable to learn interaction between different modalities at early stages. In this work, we present PointFusion and VoxelFusion: two simple yet effective early-fusion approaches to combine the RGB and point cloud modalities, by leveraging the recently introduced VoxelNet architecture. Evaluation on the KITTI dataset demonstrates significant improvements in performance over approaches which only use point cloud data. Furthermore, the proposed method provides results competitive with the state-of-the-art multimodal algorithms, achieving top-2 ranking in five of the six birds eye view and 3D detection categories on the KITTI benchmark, by using a simple single stage network.

引用

页码：7276 / 7282

页数：7

共 50 条

[1] Semantic Frustum Based VoxelNet for 3D Object Detection
Chen, Feng
Wu, Fei
Huang, Qinghua
Feng, Yujian
Ge, Qi
Ji, Yimu
Hu, Chang-Hui
Jing, Xiao-Yuan
2020 CHINESE AUTOMATION CONGRESS (CAC 2020), 2020, : 7629 - 7634
[2] Method of 3D vehicle object detection based on improved VoxelNet
Zhao, Yi-Fan
Wu, Shao-Bo
Dong, Shi-Peng
Journal of Computers (Taiwan), 2021, 32 (01) : 242 - 255
[3] VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking
Chen, Yukang
Liu, Jianhui
Zhang, Xiangyu
Qi, Xiaojuan
Jia, Jiaya
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 21674 - 21683
[4] MFF-Net: Multimodal Feature Fusion Network for 3D Object Detection
Shi, Peicheng
Liu, Zhiqiang
Qi, Heng
Yang, Aixi
CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 75 (03): : 5615 - 5637
[5] Multimodal Object Query Initialization for 3D Object Detection
van Geerenstein, Mathijs R.
Ruppel, Felicia
Dietmayers, Klaus
Gavrila, Dariu M.
2024 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2024), 2024, : 12484 - 12491
[6] VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection
Zhou, Yin
Tuzel, Oncel
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 4490 - 4499
[7] Multimodal 3D Histogram for Moving Object Detection
Mukherjee, Dibyendu
Saha, Ashirbani
Wu, Q. M. Jonathan
Jiang, Wei
2014 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC), 2014, : 2397 - 2402
[8] PPF-Net: Efficient Multimodal 3D Object Detection with Pillar-Point Fusion
Zhang, Lingxiao
Li, Changyong
ELECTRONICS, 2025, 14 (04):
[9] Virtual Sparse Convolution for Multimodal 3D Object Detection
Wu, Hai
Wen, Chenglu
Shi, Shaoshuai
Li, Xin
Wang, Cheng
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 21653 - 21662
[10] Multimodal 3D Object Detection from Simulated Pretraining
Brekke, Asmund
Vatsendvik, Fredrik
Lindseth, Frank
NORDIC ARTIFICIAL INTELLIGENCE RESEARCH AND DEVELOPMENT, 2019, 1056 : 102 - 113

← 1 2 3 4 5 →