A multilevel fusion network for 3D object detection

被引:5
|
作者
Xia, Chunlong [1 ]
Wei, Ping [1 ]
Wei, Wenwen [1 ]
Zheng, Nanning [1 ]
机构
[1] Xi An Jiao Tong Univ, Inst Artificial Intelligence & Robot, Xian, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
3D object detection; Multilevel fusion; Neural networks; SEGMENTATION;
D O I
10.1016/j.neucom.2021.01.025
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
3D object detection is an important yet challenging problem in a myriad of vision, robotics, and human- machine interaction applications. Given an RGB-D image, the task is to infer the class labels and the 3D bounding boxes of the objects in the image. While the previous studies have made remarkable progress over the past decade, how to effectively exploit the feature fusion with neural networks for boosting 3D object detection performance remains an open problem. This paper proposes a multilevel fusion network (MFN) model to detect 3D objects in RGB-D images. The MFN model contains two streams of neural networks which respectively extracts the RGB and depth features with cascaded convolutional modules. To effectively exploit the information of 3D objects, a multilevel fusion mechanism is adopted to fuse the convolutional RGB and depth features at multiple levels. To train the network, we propose a new weighted loss function by encoding the difference of geometric attributes on 3D bounding box regression. Since the original depth data is full of noisy holes, we also develop an adaptive filtering algorithm to restore and correct the depth images. We test the proposed model on challenging RGB-D datasets. The experimental results on the datasets prove the strength and advantage of the proposed model. (c) 2021 Elsevier B.V. All rights reserved. 3D object detection is an important yet challenging problem in a myriad of vision, robotics, and human? machine interaction applications. Given an RGB-D image, the task is to infer the class labels and the 3D bounding boxes of the objects in the image. While the previous studies have made remarkable progress over the past decade, how to effectively exploit the feature fusion with neural networks for boosting 3D object detection performance remains an open problem. This paper proposes a multilevel fusion network (MFN) model to detect 3D objects in RGB-D images. The MFN model contains two streams of neural networks which respectively extracts the RGB and depth features with cascaded convolutional modules. To effectively exploit the information of 3D objects, a multilevel fusion mechanism is adopted to fuse the convolutional RGB and depth features at multiple levels. To train the network, we propose a new weighted loss function by encoding the difference of geometric attributes on 3D bounding box regression. Since the original depth data is full of noisy holes, we also develop an adaptive filtering algorithm to restore and correct the depth images. We test the proposed model on challenging RGB-D datasets. The experimental results on the datasets prove the strength and advantage of the proposed model.
引用
收藏
页码:107 / 117
页数:11
相关论文
共 50 条
  • [21] Deformable Feature Fusion Network for Multi-Modal 3D Object Detection
    Guo, Kun
    Gan, Tong
    Ding, Zhao
    Ling, Qiang
    2024 3RD INTERNATIONAL CONFERENCE ON ROBOTICS, ARTIFICIAL INTELLIGENCE AND INTELLIGENT CONTROL, RAIIC 2024, 2024, : 363 - 367
  • [22] High-order multilayer attention fusion network for 3D object detection
    Zhang, Baowen
    Zhao, Yongyong
    Su, Chengzhi
    Cao, Guohua
    ENGINEERING REPORTS, 2024, 6 (12)
  • [23] Fine-Grained Multilevel Fusion for Anti-Occlusion Monocular 3D Object Detection
    Liu, He
    Liu, Huaping
    Wang, Yikai
    Sun, Fuchun
    Huang, Wenbing
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 4050 - 4061
  • [24] Towards Raw Sensor Fusion in 3D Object Detection
    Rovid, Andras
    Remeli, Viktor
    2019 IEEE 17TH WORLD SYMPOSIUM ON APPLIED MACHINE INTELLIGENCE AND INFORMATICS (SAMI 2019), 2019, : 293 - 298
  • [25] HFMDNet: Hierarchical Fusion and Multilevel Decoder Network for RGB-D Salient Object Detection
    Luo, Yi
    Shao, Feng
    Xie, Zhengxuan
    Wang, Huizhi
    Chen, Hangwei
    Mu, Baoyang
    Jiang, Qiuping
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2024, 73 : 1 - 15
  • [26] CLOCs: Camera-LiDAR Object Candidates Fusion for 3D Object Detection
    Pang, Su
    Morris, Daniel
    Radha, Hayder
    2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, : 10386 - 10393
  • [27] TBFNT3D: Two-Branch Fusion Network With Transformer for Multimodal Indoor 3D Object Detection
    Cheng, Jun
    Zhang, Sheng
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (10) : 6523 - 6530
  • [28] Adaptive learning point cloud and image diversity feature fusion network for 3D object detection
    Yan, Weiqing
    Liu, Shile
    Liu, Hao
    Yue, Guanghui
    Wang, Xuan
    Song, Yongchao
    Xu, Jindong
    COMPLEX & INTELLIGENT SYSTEMS, 2024, 10 (02) : 2825 - 2837
  • [29] VoPiFNet: Voxel-Pixel Fusion Network for Multi-Class 3D Object Detection
    Wang, Chia-Hung
    Chen, Hsueh-Wei
    Chen, Yi
    Hsiao, Pei-Yung
    Fu, Li-Chen
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, : 1 - 11
  • [30] VoPiFNet: Voxel-Pixel Fusion Network for Multi-Class 3D Object Detection
    Wang, Chia-Hung
    Chen, Hsueh-Wei
    Chen, Yi
    Hsiao, Pei-Yung
    Fu, Li-Chen
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (08) : 8527 - 8537