MonoSAID: Monocular 3D Object Detection based on Scene-Level Adaptive Instance Depth Estimation

被引:0
|
作者
Chenxing Xia
Wenjun Zhao
Huidan Han
Zhanpeng Tao
Bin Ge
Xiuju Gao
Kuan-Ching Li
Yan Zhang
机构
[1] Anhui University of Science and Technology,College of Computer Science and Engineering
[2] Institute of Energy,College of Electrical and Information Engineering
[3] Hefei Comprehensive National Science Center,Department of Computer Science and Information Engineering
[4] Anhui Purvar Bigdata Technology Co. Ltd,The School of Electronics and Information Engineering
[5] Anyang Cigarette Factory,undefined
[6] China Tobacco Henan Industrial Co.,undefined
[7] Ltd.,undefined
[8] Anhui University of Science and Technology,undefined
[9] Providence University,undefined
[10] Anhui University,undefined
来源
关键词
Monocular 3D object detection; Deep learning; Depth estimation; Autonomous driving;
D O I
暂无
中图分类号
学科分类号
摘要
Monocular 3D object detection (Mono3OD) is a challenging yet cost-effective vision task in the fields of autonomous driving and mobile robotics. The lack of reliable depth information makes obtaining accurate 3D positional information extremely difficult. In recent years, center-guided monocular 3D object detectors have directly regressed the absolute depth of the object center based on 2D detection. However, this approach heavily relies on local semantic information, ignoring contextual spatial cues and global-to-local visual correlations. Moreover, visual variations in the scene can lead to inevitable depth prediction errors for objects at different scales. To address these limitations, we propose a Mono3OD framework based on scene-level adaptive instance depth estimation (MonoSAID). Firstly, the continuous depth is discretized into multiple bins, and the width distribution of depth bins is adaptively generated based on scene-level contextual semantic information. Then, by establishing the correlation between global contextual semantic feature information and local semantic features of instances, and using the probability distribution representation of local instance features and the linear combination of bin centers distributions to solve the depth problem. In addition, a multi-scale spatial perception attention module is designed to extract attention maps of various scales through pyramid pooling operations. This design enhances the model’s receptive field and multi-scale spatial perception capabilities, thereby improving its ability to model target objects. We conducted extensive experiments on the KITTI dataset and the Waymo dataset. The results show that MonoSAID can effectively improve the 3D detection accuracy and robustness, and our method achieves state-of-the-art performance.
引用
收藏
相关论文
共 50 条
  • [21] PDR: Progressive Depth Regularization for Monocular 3D Object Detection
    Sheng, Hualian
    Cai, Sijia
    Zhao, Na
    Deng, Bing
    Zhao, Min-Jian
    Lee, Gim Hee
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (12) : 7591 - 7603
  • [22] Densely Constrained Depth Estimator for Monocular 3D Object Detection
    Li, Yingyan
    Chen, Yuntao
    He, Jiawei
    Zhang, Zhaoxiang
    COMPUTER VISION, ECCV 2022, PT IX, 2022, 13669 : 718 - 734
  • [23] 3D Scanning of Scene-Level Targets Based on the Sparse Sequence Fusion
    Wang C.
    Yang L.
    Wu X.
    Liu T.
    Geng N.
    Zhang Z.
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2023, 35 (06): : 819 - 829
  • [24] Physics-based scene-level reasoning for object pose estimation in clutter
    Mitash, Chaitanya
    Boularias, Abdeslam
    Bekris, Kostas
    INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2022, 41 (06): : 615 - 636
  • [25] Understanding Depth Map Progressively: Adaptive Distance Interval Separation for Monocular 3d Object Detection
    Cheng, Xianhui
    Qiu, Shoumeng
    Zou, Zhikang
    Pu, Jian
    Xue, Xiangyang
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [26] Synthetic Depth Transfer for Monocular 3D Object Pose Estimation in the Wild
    Kao, Yueying
    Li, Weiming
    Wang, Qiang
    Lin, Zhouchen
    Kim, Wooshik
    Hong, Sunghoon
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 11221 - 11228
  • [27] A Monocular 3D Object Detection Algorithm with Multi-Keypoint Constraints and Depth Estimation Assistance
    Zheng, Jin
    Wang, Sen
    Li, Hang
    Zhou, Yu-Hai
    Jisuanji Xuebao/Chinese Journal of Computers, 2024, 47 (12): : 2803 - 2818
  • [28] 3D Object Aided Self-Supervised Monocular Depth Estimation
    Wei, Songlin
    Chen, Guodong
    Chi, Wenzheng
    Wang, Zhenhua
    Sun, Lining
    2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2022, : 10635 - 10642
  • [29] MonoCInIS: Camera Independent Monocular 3D Object Detection using Instance Segmentation
    Heylen, Jonas
    De Wolf, Mark
    Dawagne, Bruno
    Proesmans, Marc
    Van Gool, Luc
    Abbeloos, Wim
    Abdelkawy, Hazem
    Reino, Daniel Olmeda
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 923 - 934
  • [30] Probabilistic instance shape reconstruction with sparse LiDAR for monocular 3D object detection
    Ji, Chaofeng
    Wu, Han
    Liu, Guizhong
    NEUROCOMPUTING, 2023, 529 : 92 - 100