Dynamic Point-Pixel Feature Alignment for Multimodal 3-D Object Detection

被引:3
|
作者
Wang, Juncheng [1 ]
Kong, Xiangbo [2 ]
Nishikawa, Hiroki [3 ]
Lian, Qiuyou [4 ]
Tomiyama, Hiroyuki [5 ]
机构
[1] Ritsumeikan Univ, Grad Sch Sci & Engn, Kusatsu, Shiga 5258577, Japan
[2] Toyama Prefectural Univ, Fac Engn, Dept Intelligent Robot, Toyama 9390398, Japan
[3] Osaka Univ, Grad Sch Informat Sci & Technol, Dept Informat Syst Engn, Informat Syst Synth Grp, Osaka 5650871, Japan
[4] South China Univ Technol, Grad Sch Mech & Automot Engn, Guangzhou 510641, Guangdong, Peoples R China
[5] Ritsumeikan Univ, Coll Sci & Engn, Kusatsu, Shiga 5258577, Japan
关键词
Three-dimensional displays; Object detection; Laser radar; Point cloud compression; Feature extraction; Semantics; Cameras; 3-D object detection; autonomous driving; multimodal fusion; point clouds;
D O I
10.1109/JIOT.2023.3329884
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Detection of small or distant objects is a major challenge in 3-D object detection in autonomous driving either through RGB images or LiDAR point clouds. Despite the growing popularity of sensor fusion in this task, existing fusion methods have not adequately taken into account the challenges associated with 3-D small object detection, such as semantic misalignment of small objects, caused by occlusion and calibration errors. To address this issue, we propose dynamic point-pixel feature alignment network (DPPFA-Net) for multimodal 3-D small object detection by introducing memory-based point-pixel fusion (MPPF) modules, deformable point-pixel fusion (DPPF) modules, and semantic alignment evaluator (SAE) modules. More concretely, the proposed MPPF module automatically performs intramodal and cross-modal feature interactions. The intramodal interaction reduces sensitivity to noise points, while the explicit cross-modal feature interaction based on the memory bank facilitates easier network learning and enables a more comprehensive and discriminative feature representation. The DPPF module establishes interactions exclusively with key position pixels based on a sampling strategy. This design not only guarantees a low-computational complexity but also enables adaptive fusion functionality, especially beneficial for high-resolution images. The SAE module guarantees semantic alignment of the fused features, thereby enhancing the robustness and reliability of the fusion process. Furthermore, we construct a simulated multimodal noise data set, which enables quantitative analysis of the robustness of multimodal methods under varying degrees of multimodal noise. Extensive experiments on the KITTI benchmark and challenging multimodal noisy cases show that DPPFA-Net achieves a new state-of-the-art, highlighting its effectiveness in detecting small objects. Our proposed method is compared to the first place on the KITTI leaderboard and achieves better performance by 2.07%, 6.52%, 7.18%, and 6.22% of the average precision on the varying degrees of multimodal noise cases.
引用
收藏
页码:11327 / 11340
页数:14
相关论文
共 50 条
  • [1] Point-pixel fusion for object detection and depth estimation
    Usman, Muhammad
    Ling, Qiang
    2022 41ST CHINESE CONTROL CONFERENCE (CCC), 2022, : 5458 - 5462
  • [2] PPF-Det: Point-Pixel Fusion for Multi-Modal 3D Object Detection
    Xie, Guotao
    Chen, Zhiyuan
    Gao, Ming
    Hu, Manjiang
    Qin, Xiaohui
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, 25 (06) : 5598 - 5611
  • [3] A symbolic representation for 3-D object feature detection
    Neal, PJ
    Shapiro, LG
    15TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 1, PROCEEDINGS: COMPUTER VISION AND IMAGE ANALYSIS, 2000, : 221 - 224
  • [4] APPFNet: Adaptive point-pixel fusion network for 3D semantic segmentation with neighbor feature aggregation
    Wu, Zhaolong
    Zhang, Yong
    Lan, Rukai
    Qiu, Shaohua
    Ran, Shaolin
    Liu, Yifan
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 251
  • [5] 3-D Feature Matching for Point Cloud Object Extraction
    Yu, Yongtao
    Guan, Haiyan
    Li, Dilong
    Jin, Shenghua
    Chen, Taiyue
    Wang, Cheng
    Li, Jonathan
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2020, 17 (02) : 322 - 326
  • [6] Channelwise and Spatially Guided Multimodal Feature Fusion Network for 3-D Object Detection in Autonomous Vehicles
    Uzair, Muhammad
    Dong, Jian
    Shi, Ronghua
    Mushtaq, Husnain
    Ullah, Irshad
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [7] Multimodal 3D Object Detection Method Based on Pseudo Point Cloud Feature Enhancement
    Kong D.-M.
    Li X.-W.
    Yang Q.-X.
    Jisuanji Xuebao/Chinese Journal of Computers, 2024, 47 (04): : 759 - 775
  • [8] CoreNet: Conflict Resolution Network for point-pixel misalignment and sub-task suppression of 3D LiDAR-camera object detection
    Li, Yiheng
    Yang, Yang
    Lei, Zhen
    INFORMATION FUSION, 2025, 118
  • [9] MVMM: Multiview Multimodal 3-D Object Detection for Autonomous Driving
    Li, Shangjie
    Geng, Keke
    Yin, Guodong
    Wang, Ziwei
    Qian, Min
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2024, 20 (01) : 845 - 853
  • [10] Fast 3-D object symmetry detection for point cloud
    Ruchay, Alexey
    Kalschikov, Vsevolod
    Gridnev, Alexey
    Guo, Hao
    APPLICATIONS OF DIGITAL IMAGE PROCESSING XLIV, 2021, 11842