MonoSAID: Monocular 3D Object Detection based on Scene-Level Adaptive Instance Depth Estimation

被引:0
|
作者
Chenxing Xia
Wenjun Zhao
Huidan Han
Zhanpeng Tao
Bin Ge
Xiuju Gao
Kuan-Ching Li
Yan Zhang
机构
[1] Anhui University of Science and Technology,College of Computer Science and Engineering
[2] Institute of Energy,College of Electrical and Information Engineering
[3] Hefei Comprehensive National Science Center,Department of Computer Science and Information Engineering
[4] Anhui Purvar Bigdata Technology Co. Ltd,The School of Electronics and Information Engineering
[5] Anyang Cigarette Factory,undefined
[6] China Tobacco Henan Industrial Co.,undefined
[7] Ltd.,undefined
[8] Anhui University of Science and Technology,undefined
[9] Providence University,undefined
[10] Anhui University,undefined
来源
关键词
Monocular 3D object detection; Deep learning; Depth estimation; Autonomous driving;
D O I
暂无
中图分类号
学科分类号
摘要
Monocular 3D object detection (Mono3OD) is a challenging yet cost-effective vision task in the fields of autonomous driving and mobile robotics. The lack of reliable depth information makes obtaining accurate 3D positional information extremely difficult. In recent years, center-guided monocular 3D object detectors have directly regressed the absolute depth of the object center based on 2D detection. However, this approach heavily relies on local semantic information, ignoring contextual spatial cues and global-to-local visual correlations. Moreover, visual variations in the scene can lead to inevitable depth prediction errors for objects at different scales. To address these limitations, we propose a Mono3OD framework based on scene-level adaptive instance depth estimation (MonoSAID). Firstly, the continuous depth is discretized into multiple bins, and the width distribution of depth bins is adaptively generated based on scene-level contextual semantic information. Then, by establishing the correlation between global contextual semantic feature information and local semantic features of instances, and using the probability distribution representation of local instance features and the linear combination of bin centers distributions to solve the depth problem. In addition, a multi-scale spatial perception attention module is designed to extract attention maps of various scales through pyramid pooling operations. This design enhances the model’s receptive field and multi-scale spatial perception capabilities, thereby improving its ability to model target objects. We conducted extensive experiments on the KITTI dataset and the Waymo dataset. The results show that MonoSAID can effectively improve the 3D detection accuracy and robustness, and our method achieves state-of-the-art performance.
引用
收藏
相关论文
共 50 条
  • [41] Stereo 3D object detection via instance depth prior guidance and adaptive spatial feature aggregation
    Ji, Chaofeng
    Liu, Guizhong
    Zhao, Dan
    VISUAL COMPUTER, 2023, 39 (10): : 4543 - 4554
  • [42] Boosting Monocular 3D Object Detection With Object-Centric Auxiliary Depth Supervision
    Kim, Youngseok
    Kim, Sanmin
    Sim, Sangmin
    Choi, Jun Won
    Kum, Dongsuk
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2023, 24 (02) : 1801 - 1813
  • [43] Stereo 3D object detection via instance depth prior guidance and adaptive spatial feature aggregation
    Chaofeng Ji
    Guizhong Liu
    Dan Zhao
    The Visual Computer, 2023, 39 : 4543 - 4554
  • [44] Object Detection and Depth Estimation for 3D Trajectory Extraction
    Boukhers, Zeyd
    Shirahama, Kimiaki
    Li, Frederic
    Grzegorzek, Marcin
    2015 13TH INTERNATIONAL WORKSHOP ON CONTENT-BASED MULTIMEDIA INDEXING (CBMI), 2015,
  • [45] Eliminating the Blind Spot: Adapting 3D Object Detection and Monocular Depth Estimation to 360° Panoramic Imagery
    de La Garanderie, Gregoire Payen
    Abarghouei, Amir Atapour
    Breckon, Toby P.
    COMPUTER VISION - ECCV 2018, PT XIII, 2018, 11217 : 812 - 830
  • [46] Illumination Insensitive Monocular Depth Estimation Based on Scene Object Attention and Depth Map Fusion
    Wen, Jing
    Ma, Haojiang
    Yang, Jie
    Zhang, Songsong
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT X, 2024, 14434 : 358 - 370
  • [47] Monocular 3D vehicle detection with multi -instance depth and geometry reasoning for autonomous driving
    Zhang, Junning
    Su, Qunxing
    Wang, Cheng
    Gu, Hongqiang
    NEUROCOMPUTING, 2020, 403 : 182 - 192
  • [48] Aerial Monocular 3D Object Detection
    Hu, Yue
    Fang, Shaoheng
    Xie, Weidi
    Chen, Siheng
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (04) : 1959 - 1966
  • [49] Disentangling Monocular 3D Object Detection
    Simonelli, Andrea
    Bulo, Samuel Rota
    Porzi, Lorenzo
    Lopez-Antequera, Manuel
    Kontschieder, Peter
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 1991 - 1999
  • [50] Object Detection Model Based on Scene-Level Region Proposal Self-Attention
    Quan, Yu
    Li, Zhixin
    Zhang, Canlong
    Ma, Huifang
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 954 - 961