Semi-supervised 3D object detection based on frustum transformation and RGB voxel grid

被引:0
|
作者
Wang, Yan [1 ]
Yuan, Tiantian [1 ]
Hu, Bin [1 ,2 ]
Li, Yao [2 ]
机构
[1] Technical College for the Deaf, Tianjin University of Technology, Tianjin,300384, China
[2] School of Microelectronics, Tianjin University, Tianjin,300072, China
关键词
Semi-supervised learning;
D O I
10.3788/IRLA20240206
中图分类号
学科分类号
摘要
Objective In the field of autonomous driving, high-precision object detection is crucial for ensuring safety and efficiency. A common approach is to use voxel-based methods, which are susceptible to the quantization grid size. Smaller grid sizes make the algorithm more computationally intensive, while larger grid sizes increase quantization loss, leading to the loss of precise position information and fine details. Successive convolution and down-sampling operations can further weaken the precise localization signals in the point cloud. To improve the orientation perception and accuracy of object detection, we propose a frustum transform-based method that uses RGB images to extract features and fuses them with distance information from LiDAR. This approach optimizes the strategy for extracting orientation features from the 3D point cloud. To reduce the model's dependence on annotated data, we also design a semi-supervised learning architecture that employs an adaptive pseudo-labeling method, thereby further reducing the false alarm rate of the group voting-based method. Methods We propose a LiDAR-RGB fusion network based on the frustum transform (Fig.1). Specifically, texture information is extracted from the RGB image by a deep network and fused with distance information from the LiDAR to maintain the integrity of the 3D spatial features (Fig.2). Subsequently, the weights of the voxel spatial features are optimized using the channel attention module (Fig.3). Finally, a semi-supervised learning architecture (Fig.4) is employed to reduce the false alarm rate by utilizing the spatial feature fusion module (Fig.5) and the group-based voting module. The comparative learning module is used to improve the reliability of the detection. Results and Discussions The proposed method was evaluated on the KITTI dataset (Tab.1). Our method achieved 56.30% accuracy in pedestrian detection and 75.88% accuracy in vehicle detection, with a detection rate of 21 FPS. In the ablation study of the LRFN (LiDAR-RGB Fusion Network) model (Tab.2), the RVFM (RGB Voxel Feature Module) improved the accuracy in recognizing occluded objects (Fig.6-7). The channel attention module was analyzed in comparison with other fusion modules (Tab.3, Fig.8). In the semi-supervised learning experiments, the teacher model of this study was compared with the 3DIoUMatch model (Tab.4), and the results validated the effectiveness of our teacher model. In the ablation study (Tab.5), the baseline was improved by 8.61% using the full model. These results show a significant improvement over existing methods, highlighting the detection performance of the RVFM and the teacher model. Conclusions In this study, we propose a 3D object detection technique based on frustum transform and semi-supervised learning architecture. This method maps 2D image features to 3D space, generates homogeneous RGB image voxel features using LiDAR depth distribution information, adaptively selects the voxel space, and optimizes the fusion feature characterization capability through the Channel Attention Module. Finally, targets are detected using the 3D Region Suggesting Network Module. In the ablation experiments (Tab.2), the detection accuracy of the baseline model improved when using the RGB image feature module. The RVFM effectively solved the orientation and proximity problems in visual sample analysis (Fig.6-7). Additionally, the SFF (Spatial Feature Fusion) and GBV (Group-based Voting) modules were proposed to reduce the false alarm rate, and the comparative learning module was introduced to improve the consistency of output results from different views of the student model. The experimental results (Tab.1) show that the LRFN-S (LiDAR-RGB Fusion Network-SLL) method proposed in this paper achieved significant performance, with 75.88% and 56.30% accuracy on the KITTI dataset for automobile and pedestrian detection benchmarks, respectively. © 2024 Chinese Society of Astronautics. All rights reserved.
引用
收藏
相关论文
共 50 条
  • [1] Transferable Semi-Supervised 3D Object Detection From RGB-D Data
    Tang, Yew Siang
    Lee, Gim Hee
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 1931 - 1940
  • [2] Semi-supervised 3D Object Detection with PatchTeacher and PillarMix
    Wu, Xiaopei
    Peng, Liang
    Xie, Liang
    Hou, Yuenan
    Lin, Binbin
    Huang, Xiaoshui
    Liu, Haifeng
    Cai, Deng
    Ouyang, Wanli
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 6, 2024, : 6153 - 6161
  • [3] Semi-supervised 3D Object Detection with Proficient Teachers
    Yin, Junbo
    Fang, Jin
    Zhou, Dingfu
    Zhang, Liangjun
    Xu, Cheng-Zhong
    Shen, Jianbing
    Wang, Wenguan
    COMPUTER VISION, ECCV 2022, PT XXXVIII, 2022, 13698 : 727 - 743
  • [4] Learning with Noisy Data for Semi-Supervised 3D Object Detection
    Chen, Zehui
    Li, Zhenyu
    Wang, Shuo
    Fu, Dengpan
    Zhao, Feng
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 6906 - 6916
  • [5] A semi-supervised 3D object detection method for autonomous driving
    Zhang, Jiacheng
    Liu, Huafeng
    Lu, Jianfeng
    DISPLAYS, 2022, 71
  • [6] Semi-Supervised Learning for RGB-D Object Recognition
    Cheng, Yanhua
    Zhao, Xin
    Huang, Kaiqi
    Tan, Tieniu
    2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2014, : 2377 - 2382
  • [7] A-Teacher: Asymmetric Network for 3D Semi-Supervised Object Detection
    Wang, Hanshi
    Zhang, Zhipeng
    Gao, Jin
    Hu, Weiming
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 14978 - 14987
  • [8] A Simple Vision Transformer for Weakly Semi-supervised 3D Object Detection
    Zhang, Dingyuan
    Liang, Dingkang
    Zou, Zhikang
    Li, Jingyu
    Ye, Xiaoqing
    Liu, Zhe
    Tan, Xiao
    Bai, Xiang
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 8339 - 8349
  • [9] Frustum PointNets for 3D Object Detection from RGB-D Data
    Qi, Charles R.
    Liu, Wei
    Wu, Chenxia
    Su, Hao
    Guibas, Leonidas J.
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 918 - 927
  • [10] Joint Semi-Supervised and Active Learning via 3D Consistency for 3D Object Detection
    Hwang, Sihwan
    Kim, Sanmin
    Kim, Youngseok
    Kum, Dongsuk
    2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA, 2023, : 4819 - 4825