Dynamic Point-Pixel Feature Alignment for Multimodal 3-D Object Detection

被引:3
|
作者
Wang, Juncheng [1 ]
Kong, Xiangbo [2 ]
Nishikawa, Hiroki [3 ]
Lian, Qiuyou [4 ]
Tomiyama, Hiroyuki [5 ]
机构
[1] Ritsumeikan Univ, Grad Sch Sci & Engn, Kusatsu, Shiga 5258577, Japan
[2] Toyama Prefectural Univ, Fac Engn, Dept Intelligent Robot, Toyama 9390398, Japan
[3] Osaka Univ, Grad Sch Informat Sci & Technol, Dept Informat Syst Engn, Informat Syst Synth Grp, Osaka 5650871, Japan
[4] South China Univ Technol, Grad Sch Mech & Automot Engn, Guangzhou 510641, Guangdong, Peoples R China
[5] Ritsumeikan Univ, Coll Sci & Engn, Kusatsu, Shiga 5258577, Japan
关键词
Three-dimensional displays; Object detection; Laser radar; Point cloud compression; Feature extraction; Semantics; Cameras; 3-D object detection; autonomous driving; multimodal fusion; point clouds;
D O I
10.1109/JIOT.2023.3329884
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Detection of small or distant objects is a major challenge in 3-D object detection in autonomous driving either through RGB images or LiDAR point clouds. Despite the growing popularity of sensor fusion in this task, existing fusion methods have not adequately taken into account the challenges associated with 3-D small object detection, such as semantic misalignment of small objects, caused by occlusion and calibration errors. To address this issue, we propose dynamic point-pixel feature alignment network (DPPFA-Net) for multimodal 3-D small object detection by introducing memory-based point-pixel fusion (MPPF) modules, deformable point-pixel fusion (DPPF) modules, and semantic alignment evaluator (SAE) modules. More concretely, the proposed MPPF module automatically performs intramodal and cross-modal feature interactions. The intramodal interaction reduces sensitivity to noise points, while the explicit cross-modal feature interaction based on the memory bank facilitates easier network learning and enables a more comprehensive and discriminative feature representation. The DPPF module establishes interactions exclusively with key position pixels based on a sampling strategy. This design not only guarantees a low-computational complexity but also enables adaptive fusion functionality, especially beneficial for high-resolution images. The SAE module guarantees semantic alignment of the fused features, thereby enhancing the robustness and reliability of the fusion process. Furthermore, we construct a simulated multimodal noise data set, which enables quantitative analysis of the robustness of multimodal methods under varying degrees of multimodal noise. Extensive experiments on the KITTI benchmark and challenging multimodal noisy cases show that DPPFA-Net achieves a new state-of-the-art, highlighting its effectiveness in detecting small objects. Our proposed method is compared to the first place on the KITTI leaderboard and achieves better performance by 2.07%, 6.52%, 7.18%, and 6.22% of the average precision on the varying degrees of multimodal noise cases.
引用
收藏
页码:11327 / 11340
页数:14
相关论文
共 50 条
  • [31] Feature Detection With a Constant FAR in Sparse 3-D Point Cloud Data
    Luhr, Daniel
    Adams, Martin
    Houshiar, Hamidreza
    Borrmann, Dorit
    Andreas, Nuechter
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2020, 58 (03): : 1877 - 1891
  • [32] Robust LiDAR-Camera 3-D Object Detection With Object-Level Feature Fusion
    Chen, Yongxiang
    Yan, Fuwu
    Yin, Zhishuai
    Nie, Linzhen
    Tao, Bo
    Miao, Mingze
    Zheng, Ningyu
    Zhang, Pei
    Zeng, Junyuan
    IEEE SENSORS JOURNAL, 2024, 24 (18) : 29108 - 29120
  • [33] Evolutionary Feature Learning for 3-D Object Recognition
    Shah, Syed Afaq Ali
    Bennamoun, Mohammed
    Boussaid, Farid
    While, Lyndon
    IEEE ACCESS, 2018, 6 : 2434 - 2444
  • [34] 3-D Feature Point Matching for Object Recognition Based on Estimation of Local Shape Distinctiveness
    Nagase, Masanobu
    Akizuki, Shuichi
    Hashimoto, Manabu
    COMPUTER ANALYSIS OF IMAGES AND PATTERNS, PT I, 2013, 8047 : 473 - 481
  • [35] CoFF: Cooperative Spatial Feature Fusion for 3-D Object Detection on Autonomous Vehicles
    Guo, Jingda
    Carrillo, Dominic
    Tang, Sihai
    Chen, Qi
    Yang, Qing
    Fu, Song
    Wang, Xi
    Wang, Nannan
    Palacharla, Paparao
    IEEE INTERNET OF THINGS JOURNAL, 2021, 8 (14) : 11078 - 11087
  • [36] MMFG: Multimodal-based Mutual Feature Gating 3D Object Detection
    Xu, Wanpeng
    Fu, Zhipeng
    JOURNAL OF INTELLIGENT & ROBOTIC SYSTEMS, 2024, 110 (02)
  • [37] DMFF: dual-way multimodal feature fusion for 3D object detection
    Dong, Xiaopeng
    Di, Xiaoguang
    Wang, Wenzhuang
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (01) : 455 - 463
  • [38] Multimodal feature adaptive fusion for anchor-free 3D object detection
    Wu, Yanli
    Wang, Junyin
    Li, Hui
    Ai, Xiaoxue
    Li, Xiao
    APPLIED INTELLIGENCE, 2025, 55 (07)
  • [39] MFF-Net: Multimodal Feature Fusion Network for 3D Object Detection
    Shi, Peicheng
    Liu, Zhiqiang
    Qi, Heng
    Yang, Aixi
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 75 (03): : 5615 - 5637
  • [40] DMFF: dual-way multimodal feature fusion for 3D object detection
    Xiaopeng Dong
    Xiaoguang Di
    Wenzhuang Wang
    Signal, Image and Video Processing, 2024, 18 (1) : 455 - 463