MVX-Net: Multimodal VoxelNet for 3D Object Detection

被引：0

作者：

Sindagi, Vishwanath A. ^{[1
]}

Zhou, Yin ^{[2
]}

Tuzel, Oncel ^{[2
]}

机构：

[1] Johns Hopkins Univ, Dept Elect & Comp Engn, Baltimore, MD 21218 USA

[2] Apple Inc, AI Res, Cupertino, CA 95014 USA

来源：

2019 INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA) | 2019年

关键词：

REPRESENTATION;

D O I：

10.1109/icra.2019.8794195

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Many recent works on 3D object detection have focused on designing neural network architectures that can consume point cloud data. While these approaches demonstrate encouraging performance, they are typically based on a single modality and are unable to leverage information from other modalities, such as a camera. Although a few approaches fuse data from different modalities, these methods either use a complicated pipeline to process the modalities sequentially, or perform late-fusion and are unable to learn interaction between different modalities at early stages. In this work, we present PointFusion and VoxelFusion: two simple yet effective early-fusion approaches to combine the RGB and point cloud modalities, by leveraging the recently introduced VoxelNet architecture. Evaluation on the KITTI dataset demonstrates significant improvements in performance over approaches which only use point cloud data. Furthermore, the proposed method provides results competitive with the state-of-the-art multimodal algorithms, achieving top-2 ranking in five of the six birds eye view and 3D detection categories on the KITTI benchmark, by using a simple single stage network.

引用

页码：7276 / 7282

页数：7

共 50 条

[21] MF-Net: Meta Fusion Network for 3D object detection
Meng, Zhaoxin
Luo, Guiyang
Yuan, Quan
Li, Jinglin
Yang, Fangchun
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[22] VSL-Net: Voxel structure learning for 3D object detection
Cao, Feng
Zhou, Feng
Tao, Chongben
Xue, Jun
Gao, Zhen
Zhang, Zufeng
Zhu, Yuan
Advanced Engineering Informatics, 2024, 59
[23] VPC-VoxelNet: multi-modal fusion 3D object detection networks based on virtual point clouds
Zhang, Qiang
Shi, Qin
Cheng, Teng
Zhang, Junning
Chen, Jiong
INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2025, 14 (01)
[24] Homogenous multimodal 3D object detection based on deformable Transformer and attribute dependencies
Dong, Yue
Li, Xingfeng
He, Hua
PROCEEDINGS OF 2024 3RD INTERNATIONAL CONFERENCE ON CYBER SECURITY, ARTIFICIAL INTELLIGENCE AND DIGITAL ECONOMY, CSAIDE 2024, 2024, : 346 - 351
[25] MMFG: Multimodal-based Mutual Feature Gating 3D Object Detection
Xu, Wanpeng
Fu, Zhipeng
JOURNAL OF INTELLIGENT & ROBOTIC SYSTEMS, 2024, 110 (02)
[26] PVConvNet: Pixel-Voxel Sparse Convolution for multimodal 3D object detection
Liu, Huaijin
Du, Jixiang
Zhang, Yong
Zhang, Hongbo
Zeng, Jiandian
PATTERN RECOGNITION, 2024, 149
[27] DMFF: dual-way multimodal feature fusion for 3D object detection
Dong, Xiaopeng
Di, Xiaoguang
Wang, Wenzhuang
SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (01) : 455 - 463
[28] CAF-RCNN: multimodal 3D object detection with cross-attention
Liu, Junting
Liu, Deer
Zhu, Lei
INTERNATIONAL JOURNAL OF REMOTE SENSING, 2023, 44 (19) : 6131 - 6146
[29] Multimodal feature adaptive fusion for anchor-free 3D object detection
Wu, Yanli
Wang, Junyin
Li, Hui
Ai, Xiaoxue
Li, Xiao
APPLIED INTELLIGENCE, 2025, 55 (07)
[30] Singular and Multimodal Techniques of 3D Object Detection: Constraints, Advancements and Research Direction
Karim, Tajbia
Mahayuddin, Zainal Rasyid
Hasan, Mohammad Kamrul
APPLIED SCIENCES-BASEL, 2023, 13 (24):

← 1 2 3 4 5 →