Dynamic Point-Pixel Feature Alignment for Multimodal 3-D Object Detection

被引:3
|
作者
Wang, Juncheng [1 ]
Kong, Xiangbo [2 ]
Nishikawa, Hiroki [3 ]
Lian, Qiuyou [4 ]
Tomiyama, Hiroyuki [5 ]
机构
[1] Ritsumeikan Univ, Grad Sch Sci & Engn, Kusatsu, Shiga 5258577, Japan
[2] Toyama Prefectural Univ, Fac Engn, Dept Intelligent Robot, Toyama 9390398, Japan
[3] Osaka Univ, Grad Sch Informat Sci & Technol, Dept Informat Syst Engn, Informat Syst Synth Grp, Osaka 5650871, Japan
[4] South China Univ Technol, Grad Sch Mech & Automot Engn, Guangzhou 510641, Guangdong, Peoples R China
[5] Ritsumeikan Univ, Coll Sci & Engn, Kusatsu, Shiga 5258577, Japan
关键词
Three-dimensional displays; Object detection; Laser radar; Point cloud compression; Feature extraction; Semantics; Cameras; 3-D object detection; autonomous driving; multimodal fusion; point clouds;
D O I
10.1109/JIOT.2023.3329884
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Detection of small or distant objects is a major challenge in 3-D object detection in autonomous driving either through RGB images or LiDAR point clouds. Despite the growing popularity of sensor fusion in this task, existing fusion methods have not adequately taken into account the challenges associated with 3-D small object detection, such as semantic misalignment of small objects, caused by occlusion and calibration errors. To address this issue, we propose dynamic point-pixel feature alignment network (DPPFA-Net) for multimodal 3-D small object detection by introducing memory-based point-pixel fusion (MPPF) modules, deformable point-pixel fusion (DPPF) modules, and semantic alignment evaluator (SAE) modules. More concretely, the proposed MPPF module automatically performs intramodal and cross-modal feature interactions. The intramodal interaction reduces sensitivity to noise points, while the explicit cross-modal feature interaction based on the memory bank facilitates easier network learning and enables a more comprehensive and discriminative feature representation. The DPPF module establishes interactions exclusively with key position pixels based on a sampling strategy. This design not only guarantees a low-computational complexity but also enables adaptive fusion functionality, especially beneficial for high-resolution images. The SAE module guarantees semantic alignment of the fused features, thereby enhancing the robustness and reliability of the fusion process. Furthermore, we construct a simulated multimodal noise data set, which enables quantitative analysis of the robustness of multimodal methods under varying degrees of multimodal noise. Extensive experiments on the KITTI benchmark and challenging multimodal noisy cases show that DPPFA-Net achieves a new state-of-the-art, highlighting its effectiveness in detecting small objects. Our proposed method is compared to the first place on the KITTI leaderboard and achieves better performance by 2.07%, 6.52%, 7.18%, and 6.22% of the average precision on the varying degrees of multimodal noise cases.
引用
收藏
页码:11327 / 11340
页数:14
相关论文
共 50 条
  • [41] Equal Emphasis on Data and Network: A Two-Stage 3D Point Cloud Object Detection Algorithm with Feature Alignment
    Xiao, Kai
    Li, Teng
    Li, Jun
    Huang, Da
    Peng, Yuanxi
    REMOTE SENSING, 2024, 16 (02)
  • [42] VoxelNextFusion: A Simple, Unified, and Effective Voxel Fusion Framework for Multimodal 3-D Object Detection
    Song, Ziying
    Zhang, Guoxin
    Xie, Jun
    Liu, Lin
    Jia, Caiyan
    Xu, Shaoqing
    Wang, Zhepeng
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61 : 1 - 12
  • [43] Semantics feature sampling for point-based 3D object detection
    Huang, Jing-Dong
    Du, Ji-Xiang
    Zhang, Hong-Bo
    Liu, Huai-Jin
    IMAGE AND VISION COMPUTING, 2024, 149
  • [44] 3D Object Detection Based on Feature Fusion of Point Cloud Sequences
    Zhai, Zhenyu
    Wang, Qiantong
    Pan, Zongxu
    Hu, Wenlong
    Hu, Yuxin
    2022 IEEE 17TH CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA), 2022, : 1240 - 1245
  • [45] Point Transformer-Based Salient Object Detection Network for 3-D Measurement Point Clouds
    Wei, Zeyong
    Chen, Baian
    Wang, Weiming
    Chen, Honghua
    Wei, Mingqiang
    Li, Jonathan
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 11
  • [46] Accelerating Point-Voxel Representation of 3-D Object Detection for Automatic Driving
    Cao J.
    Tao C.
    Zhang Z.
    Gao Z.
    Luo X.
    Zheng S.
    Zhu Y.
    IEEE Transactions on Artificial Intelligence, 2024, 5 (01): : 254 - 266
  • [47] Sub-pixel feature extraction and edge detection in 3-d measuring using structured lights
    Liang, Zhiguo
    Xu, Ke
    Xu, Jinwu
    Song, Qiang
    Jixie Gongcheng Xuebao/Chinese Journal of Mechanical Engineering, 2004, 40 (12): : 96 - 99
  • [48] Multimodal Object Query Initialization for 3D Object Detection
    van Geerenstein, Mathijs R.
    Ruppel, Felicia
    Dietmayers, Klaus
    Gavrila, Dariu M.
    2024 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2024), 2024, : 12484 - 12491
  • [49] F-PVNet: Frustum-Level 3-D Object Detection on Point-Voxel Feature Representation for Autonomous Driving
    Tao, Chongben
    Fu, Shiping
    Wang, Chen
    Luo, Xizhao
    Li, Huayi
    Gao, Zhen
    Zhang, Zufeng
    Zheng, Sifa
    IEEE INTERNET OF THINGS JOURNAL, 2023, 10 (09) : 8031 - 8045
  • [50] Exploring Pixel Alignment on Shallow Feature for Weakly Supervised Object Localization
    Cao, Xinzi
    Yang, Meng
    Sun, Guoyin
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,