A multilevel fusion network for 3D object detection

被引：5

作者：

Xia, Chunlong ^{[1
]}

Wei, Ping ^{[1
]}

Wei, Wenwen ^{[1
]}

Zheng, Nanning ^{[1
]}

机构：

[1] Xi An Jiao Tong Univ, Inst Artificial Intelligence & Robot, Xian, Peoples R China

来源：

NEUROCOMPUTING | 2021年 / 437卷

基金：

中国国家自然科学基金; 中国博士后科学基金;

关键词：

3D object detection; Multilevel fusion; Neural networks; SEGMENTATION;

D O I：

10.1016/j.neucom.2021.01.025

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

3D object detection is an important yet challenging problem in a myriad of vision, robotics, and human- machine interaction applications. Given an RGB-D image, the task is to infer the class labels and the 3D bounding boxes of the objects in the image. While the previous studies have made remarkable progress over the past decade, how to effectively exploit the feature fusion with neural networks for boosting 3D object detection performance remains an open problem. This paper proposes a multilevel fusion network (MFN) model to detect 3D objects in RGB-D images. The MFN model contains two streams of neural networks which respectively extracts the RGB and depth features with cascaded convolutional modules. To effectively exploit the information of 3D objects, a multilevel fusion mechanism is adopted to fuse the convolutional RGB and depth features at multiple levels. To train the network, we propose a new weighted loss function by encoding the difference of geometric attributes on 3D bounding box regression. Since the original depth data is full of noisy holes, we also develop an adaptive filtering algorithm to restore and correct the depth images. We test the proposed model on challenging RGB-D datasets. The experimental results on the datasets prove the strength and advantage of the proposed model. (c) 2021 Elsevier B.V. All rights reserved. 3D object detection is an important yet challenging problem in a myriad of vision, robotics, and human? machine interaction applications. Given an RGB-D image, the task is to infer the class labels and the 3D bounding boxes of the objects in the image. While the previous studies have made remarkable progress over the past decade, how to effectively exploit the feature fusion with neural networks for boosting 3D object detection performance remains an open problem. This paper proposes a multilevel fusion network (MFN) model to detect 3D objects in RGB-D images. The MFN model contains two streams of neural networks which respectively extracts the RGB and depth features with cascaded convolutional modules. To effectively exploit the information of 3D objects, a multilevel fusion mechanism is adopted to fuse the convolutional RGB and depth features at multiple levels. To train the network, we propose a new weighted loss function by encoding the difference of geometric attributes on 3D bounding box regression. Since the original depth data is full of noisy holes, we also develop an adaptive filtering algorithm to restore and correct the depth images. We test the proposed model on challenging RGB-D datasets. The experimental results on the datasets prove the strength and advantage of the proposed model.

引用

页码：107 / 117

页数：11

共 50 条

[31] Adaptive and azimuth-aware fusion network of multimodal local features for 3D object detection
Tian, Yonglin
Wang, Kunfeng
Wang, Yuang
Tian, Yulin
Wang, Zilei
Wang, Fei-Yue
NEUROCOMPUTING, 2020, 411 : 32 - 44
[32] A Two-Phase Cross-Modality Fusion Network for Robust 3D Object Detection
Jiao, Yujun
Yin, Zhishuai
SENSORS, 2020, 20 (21) : 1 - 14
[33] A GEOMETRIC CONVOLUTIONAL NEURAL NETWORK FOR 3D OBJECT DETECTION
Lu, Yawen
Guo, Qianyu
Lu, Guoyu
2019 7TH IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (IEEE GLOBALSIP), 2019,
[34] A New Monocular 3D Object Detection with Neural Network
Hong, Weijie
Liu, Yiguang
Zheng, Yunan
Wang, Ying
Shi, Xuelei
PATTERN RECOGNITION AND COMPUTER VISION (PRCV 2018), PT IV, 2018, 11259 : 174 - 185
[35] VENet: Voting Enhancement Network for 3D Object Detection
Xie, Qian
Lai, Yu-Kun
Wu, Jing
Wang, Zhoutao
Lu, Dening
Wei, Mingqiang
Wang, Jun
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 3692 - 3701
[36] Multi-feature Fusion VoteNet for 3D Object Detection
Wang, Zhoutao
Xie, Qian
Wei, Mingqiang
Long, Kun
Wang, Jun
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2022, 18 (01)
[37] Sensor Fusion for Joint 3D Object Detection and Semantic Segmentation
Meyer, Gregory P.
Charland, Jake
Hegde, Darshan
Laddha, Ankit
Vallespi-Gonzalez, Carlos
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2019), 2019, : 1230 - 1237
[38] A LiDAR-Camera Fusion 3D Object Detection Algorithm
Liu, Leyuan
He, Jian
Ren, Keyan
Xiao, Zhonghua
Hou, Yibin
INFORMATION, 2022, 13 (04)
[39] FusionPainting: Multimodal Fusion with Adaptive Attention for 3D Object Detection
Xu, Shaoqing
Zhou, Dingfu
Fang, Jin
Yin, Junbo
Bin, Zhou
Zhang, Liangjun
2021 IEEE INTELLIGENT TRANSPORTATION SYSTEMS CONFERENCE (ITSC), 2021, : 3047 - 3054
[40] Frame Fusion with Vehicle Motion Prediction for 3D Object Detection
Li, Xirui
Wang, Feng
Wang, Naiyan
Ma, Chao
2024 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2024, 2024, : 4252 - 4258

← 1 2 3 4 5 →