Adaptive temporal fusion network with depth supervision and modulation for robust three-dimensional object detection in complex scenes

被引:0
|
作者
Liu, Yifan [1 ]
Zhang, Yong [1 ]
Lan, Rukai [1 ]
Cui, Xiaopeng [2 ]
Xie, Linbo [3 ]
Wu, Zhaolong [1 ]
机构
[1] Wuhan Univ Sci & Technol, Sch Informat Sci & Engn, Wuhan 430081, Peoples R China
[2] Naval Univ Engn, Natl Key Lab Electromagnet Energy, Wuhan 430030, Peoples R China
[3] Jiangnan Univ, Sch Internet Things Engn, Wuxi 214122, Jiangsu, Peoples R China
关键词
Autonomous driving; Three-dimensional object detection; Multi-modal; Deformable attention; Temporal fusion;
D O I
10.1016/j.engappai.2024.109988
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Autonomous driving perception relies on cameras and Light Detection and Ranging (LiDAR) sensors. Existing methods for LiDAR-camera fusion are primarily based on the Lift-Splat (LS) framework, which serves as a foundation for multi-modal fusion. However, these methods still face challenges such as unreliable depth information, insufficient dynamic perception, and limited robustness. This paper proposes a novel multi-modal three-dimensional (3D) detection method that optimizes depth by fully leveraging image and point cloud data and employs spatiotemporal deformable attention for adaptive fusion across frames. Specifically, we generate optimized depth maps through point clouds for depth supervision, refine the depth using Conditional Random Fields (CRF), and improve the fusion features by optimizing the depth estimation range. Additionally, we propose a dual-alignment method with spatiotemporal adaptive attention to acquire high-quality temporal features, allowing the model to learn beneficial information from adjacent frames. The proposed method achieves leading mean Average Precision (mAP) on mainstream 3D object detection datasets. Extensive experiments on multiple datasets demonstrate the superiority of the proposed method. Notably, our method remains effective even when a sensor fails, highlighting its potential to improve the robustness of autonomous perception in real-world scenarios.
引用
收藏
页数:15
相关论文
共 27 条
  • [1] Surface matching for object recognition in complex three-dimensional scenes
    Johnson, AE
    Hebert, M
    IMAGE AND VISION COMPUTING, 1998, 16 (9-10) : 635 - 651
  • [2] Object Detection Algorithm for Complex Road Scenes Based on Adaptive Feature Fusion
    Ran, Xiansheng
    Su, Shanjie
    Chen, Junhao
    Zhang, Zhiyun
    Computer Engineering and Applications, 2023, 59 (24) : 216 - 226
  • [3] RADNet: a highly and robust dynamic network for object detection in complex road scenes
    Zhao, Wenyang
    Yang, Xiaoyao
    Wang, Yong
    COMPUTER JOURNAL, 2025,
  • [4] Three-Dimensional Object Detection Network Based on Multi-Layer and Multi-Modal Fusion
    Zhu, Wenming
    Zhou, Jia
    Wang, Zizhe
    Zhou, Xuehua
    Zhou, Feng
    Sun, Jingwen
    Song, Mingrui
    Zhou, Zhiguo
    ELECTRONICS, 2024, 13 (17)
  • [5] Multiscale Monocular Three-Dimensional Object Detection Algorithm Incorporating Instance Depth
    Wang Fengsui
    Xiong Lei
    Qian Yaping
    LASER & OPTOELECTRONICS PROGRESS, 2023, 60 (16)
  • [6] Three-Dimensional Point Cloud Object Detection Based on Feature Fusion and Enhancement
    Li, Yangyang
    Ou, Zejun
    Liu, Guangyuan
    Yang, Zichen
    Chen, Yanqiao
    Shang, Ronghua
    Jiao, Licheng
    REMOTE SENSING, 2024, 16 (06)
  • [7] Three-dimensional point cloud object segmentation and collision detection based on depth projection
    Wang Z.-F.
    Liu C.-Y.
    Sui X.
    Yang F.
    Ma X.-Q.
    Chen L.-H.
    Guangxue Jingmi Gongcheng/Optics and Precision Engineering, 2020, 28 (07): : 1600 - 1608
  • [8] Fusion of dense spatial features and sparse temporal features for three-dimensional structure estimation in urban scenes
    Nawaf, Mohamad Motasem
    Tremeau, Alain
    IET COMPUTER VISION, 2013, 7 (05) : 302 - 310
  • [9] Three-dimensional object detection network based on geometric information supplement strategy
    Zhou, Jing
    Hu, Yiyu
    Huang, Xinhan
    Wang, Tianjiang
    COMPUTERS & ELECTRICAL ENGINEERING, 2023, 106
  • [10] NeXtFusion: Attention-Based Camera-Radar Fusion Network for Improved Three-Dimensional Object Detection and Tracking
    Kalgaonkar, Priyank
    El-Sharkawy, Mohamed
    FUTURE INTERNET, 2024, 16 (04)