HCPVF: Hierarchical Cascaded Point-Voxel Fusion for 3D Object Detection

被引:6
|
作者
Fan, Baojie [1 ,2 ,3 ]
Zhang, Kexin [1 ,2 ,3 ]
Tian, Jiandong [4 ]
机构
[1] Nanjing Univ Posts & Telecommun, Coll Automat, Nanjing 210023, Peoples R China
[2] Nanjing Univ Posts & Telecommun, Coll Artificial Intelligence, Nanjing 210023, Peoples R China
[3] Wuzhou Univ, Guangxi Key Lab Machine Vis & Intelligent Control, Wuzhou 543002, Peoples R China
[4] Chinese Acad Sci, Shenyang Inst Automat, State Key Lab Robot, Beijing 100045, Peoples R China
基金
中国国家自然科学基金;
关键词
Three-dimensional displays; Feature extraction; Point cloud compression; Proposals; Object detection; Detectors; Transformers; 3D object detection; BEV; voxel; point cloud;
D O I
10.1109/TCSVT.2023.3268849
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
With the astonishing development of 3D sensors, point cloud based 3D object detection is attracting increasing attention from both industry and academia, and widely applied in various fields, such as robotics and autonomous driving. However, how to balance the 3D object detecting accuracy and speed is still a challenging problem. In this paper, we study this issue and propose a novel and effective 3D point cloudy object detection network based on hierarchical cascaded point-voxel fusion, called HCPVF. Firstly, a novel bird's-eye-view(BEV) attention mechanism with linear complexity is developed to improve point cloud feature backbone network, which can be implemented easily to mine the point-to-point similarity in BEV's view, by two cascaded linear layers and two normalization layers. This operation captures long-range dependencies and reduces the uneven sampling of sparse BEV features, making the extracted point cloudy features more discriminative. Secondly, the proposed HCPVF module is equipped with dual-level hierarchical cascaded detection head, including voxel level and the following point level. The voxel level is composed of coarse Region of interest(RoI) pooling and fine RoI pooling, which are cooperated to aggregate voxel features from different grid divisions and predict relatively coarse detection boxes. In the following, the point level is based on Key Points Transformer. It firstly encodes the spatial context information between the original point and the voxel level box. And then, a novel dual-weighted decoder is developed to enhance the context interaction by weighting the channel and spatial dimensions to obtain more accurate detection results. This design utilizes the voxel based method with high computational efficiency and the point based method with more complete spatial information, fusing low-level voxel features and high-level point features through hierarchical cascaded strategy. Extensive experiments demonstate that the proposed HCPVF achieves state-of-the-art 3D detection performance while maintaining computational efficiency on both the Waymo Open Dataset and the highly-competitive KITTI benchmark.
引用
收藏
页码:8997 / 9009
页数:13
相关论文
共 50 条
  • [21] CasFormer: Cascaded Transformer Based on Dynamic Voxel Pyramid for 3D Object Detection from Point Clouds
    Li, Xinglong
    Zhang, Xiaowei
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT III, 2024, 14427 : 299 - 311
  • [22] PVDECONV: POINT-VOXEL DECONVOLUTION FOR AUTOENCODING CAD CONSTRUCTION IN 3D
    Cherenkova, Kseniya
    Aouada, Djamila
    Gusev, Gleb
    2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 2741 - 2745
  • [23] PV-RCNN plus plus : Point-Voxel Feature Set Abstraction With Local Vector Representation for 3D Object Detection
    Shi, Shaoshuai
    Jiang, Li
    Deng, Jiajun
    Wang, Zhe
    Guo, Chaoxu
    Shi, Jianping
    Wang, Xiaogang
    Li, Hongsheng
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2023, 131 (02) : 531 - 551
  • [24] Hierarchical Point Attention for Indoor 3D Object Detection
    Shu, Manli
    Xue, Le
    Yu, Ning
    Martin-Martin, Roberto
    Xiong, Calming
    Goldstein, Tom
    Niebles, Juan Carlos
    Xu, Ran
    2024 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2024, 2024, : 4245 - 4251
  • [25] Point-Voxel Based Geometry-Adaptive Network for 3D Point Cloud Analysis
    Zhao, Tian-Meng
    Zeng, Hui
    Zhang, Bao-Qing
    Liu, Hong-Min
    Fan, Bin
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2024, 39 (05) : 1167 - 1179
  • [26] Voxel Transformer for 3D Object Detection
    Mao, Jiageng
    Xue, Yujing
    Niu, Minzhe
    Bai, Haoyue
    Feng, Jiashi
    Liang, Xiaodan
    Xu, Hang
    Xu, Chunjing
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 3144 - 3153
  • [27] PVTransformer: Point-to-Voxel Transformer for Scalable 3D Object Detection
    Leng, Zhaoqi
    Sun, Pei
    He, Tong
    Anguelov, Dragomir
    Tan, Mingxing
    2024 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2024, 2024, : 4238 - 4244
  • [28] Cascaded Cross-Modality Fusion Network for 3D Object Detection
    Chen, Zhiyu
    Lin, Qiong
    Sun, Jing
    Feng, Yujian
    Liu, Shangdong
    Liu, Qiang
    Ji, Yimu
    Xu, He
    SENSORS, 2020, 20 (24) : 1 - 14
  • [29] From Voxel to Point: IoU-guided 3D Object Detection for Point Cloud with Voxel-to-Point Decoder
    Li, Jiale
    Dai, Hang
    Shao, Ling
    Ding, Yong
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4622 - 4631
  • [30] A Hierarchical Graph Network for 3D Object Detection on Point Clouds
    Chen, Jintai
    Lei, Biwen
    Song, Qingyu
    Ying, Haochao
    Chen, Danny Z.
    Wu, Jian
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 389 - 398