MonoSAID: Monocular 3D Object Detection based on Scene-Level Adaptive Instance Depth Estimation

被引:0
|
作者
Chenxing Xia
Wenjun Zhao
Huidan Han
Zhanpeng Tao
Bin Ge
Xiuju Gao
Kuan-Ching Li
Yan Zhang
机构
[1] Anhui University of Science and Technology,College of Computer Science and Engineering
[2] Institute of Energy,College of Electrical and Information Engineering
[3] Hefei Comprehensive National Science Center,Department of Computer Science and Information Engineering
[4] Anhui Purvar Bigdata Technology Co. Ltd,The School of Electronics and Information Engineering
[5] Anyang Cigarette Factory,undefined
[6] China Tobacco Henan Industrial Co.,undefined
[7] Ltd.,undefined
[8] Anhui University of Science and Technology,undefined
[9] Providence University,undefined
[10] Anhui University,undefined
来源
关键词
Monocular 3D object detection; Deep learning; Depth estimation; Autonomous driving;
D O I
暂无
中图分类号
学科分类号
摘要
Monocular 3D object detection (Mono3OD) is a challenging yet cost-effective vision task in the fields of autonomous driving and mobile robotics. The lack of reliable depth information makes obtaining accurate 3D positional information extremely difficult. In recent years, center-guided monocular 3D object detectors have directly regressed the absolute depth of the object center based on 2D detection. However, this approach heavily relies on local semantic information, ignoring contextual spatial cues and global-to-local visual correlations. Moreover, visual variations in the scene can lead to inevitable depth prediction errors for objects at different scales. To address these limitations, we propose a Mono3OD framework based on scene-level adaptive instance depth estimation (MonoSAID). Firstly, the continuous depth is discretized into multiple bins, and the width distribution of depth bins is adaptively generated based on scene-level contextual semantic information. Then, by establishing the correlation between global contextual semantic feature information and local semantic features of instances, and using the probability distribution representation of local instance features and the linear combination of bin centers distributions to solve the depth problem. In addition, a multi-scale spatial perception attention module is designed to extract attention maps of various scales through pyramid pooling operations. This design enhances the model’s receptive field and multi-scale spatial perception capabilities, thereby improving its ability to model target objects. We conducted extensive experiments on the KITTI dataset and the Waymo dataset. The results show that MonoSAID can effectively improve the 3D detection accuracy and robustness, and our method achieves state-of-the-art performance.
引用
收藏
相关论文
共 50 条
  • [1] MonoSAID: Monocular 3D Object Detection based on Scene-Level Adaptive Instance Depth Estimation
    Xia, Chenxing
    Zhao, Wenjun
    Han, Huidan
    Tao, Zhanpeng
    Ge, Bin
    Gao, Xiuju
    Li, Kuan-Ching
    Zhang, Yan
    JOURNAL OF INTELLIGENT & ROBOTIC SYSTEMS, 2024, 110 (01)
  • [2] Monocular 3D object detection with thermodynamic loss and decoupled instance depth
    Liu, Gang
    Xie, Xiaoxiao
    Yu, Qingchen
    CONNECTION SCIENCE, 2024, 36 (01)
  • [3] Deep Optics for Monocular Depth Estimation and 3D Object Detection
    Chang, Julie
    Wetzstein, Gordon
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 10192 - 10201
  • [4] eGAC3D: enhancing depth adaptive convolution and depth estimation for monocular 3D object pose detection
    Ngo, Duc Tuan
    Bui, Minh-Quan Viet
    Nguyen, Duc Dung
    Pham, Hoang-Anh
    PEERJ COMPUTER SCIENCE, 2022, 8
  • [5] DID-M3D: Decoupling Instance Depth for Monocular 3D Object Detection
    Peng, Liang
    Wu, Xiaopei
    Yang, Zheng
    Liu, Haifeng
    Cai, Deng
    COMPUTER VISION - ECCV 2022, PT I, 2022, 13661 : 71 - 88
  • [6] MonoDFNet: Monocular 3D Object Detection with Depth Fusion and Adaptive Optimization
    Gao, Yuhan
    Wang, Peng
    Li, Xiaoyan
    Sun, Mengyu
    Di, Ruohai
    Li, Liangliang
    Hong, Wei
    SENSORS, 2025, 25 (03)
  • [7] Exploiting Ground Depth Estimation for Mobile Monocular 3D Object Detection
    Zhou, Yunsong
    Liu, Quan
    Zhu, Hongzi
    Li, Yunzhe
    Chang, Shan
    Guo, Minyi
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2025, 47 (04) : 3079 - 3093
  • [8] Task-Aware Monocular Depth Estimation for 3D Object Detection
    Wang, Xinlong
    Yin, Wei
    Kong, Tao
    Jiang, Yuning
    Li, Lei
    Shen, Chunhua
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 12257 - 12264
  • [9] Monocular 3D object detection for construction scene analysis
    Shen, Jie
    Jiao, Lang
    Zhang, Cong
    Peng, Keran
    COMPUTER-AIDED CIVIL AND INFRASTRUCTURE ENGINEERING, 2024, 39 (09) : 1370 - 1389
  • [10] MoGDE: Boosting Mobile Monocular 3D Object Detection with Ground Depth Estimation
    Zhou, Yunsong
    Liu, Quan
    Zhu, Hongzi
    Li, Yunzhe
    Chang, Shan
    Guo, Minyi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,