SOccDPT: 3D Semantic Occupancy From Dense Prediction Transformers Trained Under Memory Constraints

被引:0
|
作者
Ganesh, Aditya Nalgunda [1 ]
机构
[1] PES Univ, Dept Comp Sci, Bengaluru, Karnataka, India
关键词
3D Vision; Semantic occupancy; Depth perception; Occupancy network;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present SOccDPT, a memory-efficient approach for 3D semantic occupancy prediction from monocular image input using dense prediction transformers. To address the limitations of existing methods trained on structured traffic datasets, we train our model on unstructured datasets including the Indian Driving Dataset and Bengaluru Driving Dataset. Our semi- supervised training pipeline allows SOccDPT to learn from datasets with limited labels by reducing the requirement for manual labeling by substituting it with pseudo-ground truth labels to produce our Bengaluru Semantic Occupancy Dataset. This broader training enhances our model's ability to handle unstructured traffic scenarios effectively. To overcome memory limitations during training, we introduce patch-wise training where we select a subset of parameters to train each epoch, reducing memory usage during auto-grad graph construction. In the context of unstructured traffic and memory-constrained training and inference, SOccDPT outperforms existing disparity estimation approaches as shown by the RMSE score of 9.1473, achieves a semantic segmentation IoU score of 46.02% and operates at a competitive frequency of 69.47 Hz. We make our code and semantic occupancy dataset public(1).
引用
收藏
页码:2201 / 2212
页数:12
相关论文
共 50 条
  • [11] OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy Prediction
    Zhang, Yunpeng
    Zhu, Zheng
    Du, Dalong
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 9399 - 9409
  • [12] Tangent Convolutions for Dense Prediction in 3D
    Tatarchenko, Maxim
    Park, Jaesik
    Koltun, Vladlen
    Zhou, Qian-Yi
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 3887 - 3896
  • [13] Exploration Methodology for 3D Memory Redundancy Architectures under Redundancy Constraints
    Lin, Bing-Yang
    Lee, Mincent
    Wu, Cheng-Wen
    2013 22ND ASIAN TEST SYMPOSIUM (ATS), 2013, : 1 - 6
  • [14] POP-3D: Open-Vocabulary 3D Occupancy Prediction from Images
    Vobecky, Antonin
    Simeoni, Oriane
    Hurych, David
    Gidaris, Spyros
    Bursuc, Andrei
    Perez, Patrick
    Sivic, Josef
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [15] nuCraft: Crafting High Resolution 3D Semantic Occupancy for Unified 3D Scene Understanding
    Zhu, Benjin
    Wang, Zhe
    Li, Hongsheng
    COMPUTER VISION - ECCV 2024, PT V, 2025, 15063 : 125 - 141
  • [16] Dense 3D Semantic Mapping of Indoor Scenes from RGB-D Images
    Hermans, Alexander
    Floros, Georgios
    Leibe, Bastian
    2014 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2014, : 2631 - 2638
  • [17] Dynamics Aware 3D Occupancy Grid Map with Semantic Information
    Lucas Chiesa, Alberto
    2013 16TH INTERNATIONAL CONFERENCE ON ADVANCED ROBOTICS (ICAR), 2013,
  • [18] Dense 3D Visual Mapping via Semantic Simplification
    Morreale, Luca
    Romanoni, Andrea
    Matteucci, Matteo
    2019 INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2019, : 6891 - 6897
  • [19] COTR: Compact Occupancy TRansformer for Vision-based 3D Occupancy Prediction
    Ma, Qihang
    Tan, Xin
    Qu, Yanyun
    Ma, Lizhuang
    Zhang, Zhizhong
    Xie, Yuan
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 19936 - 19945
  • [20] Semantic 3D Occupancy Mapping through Efficient High Order CRFs
    Yang, Shichao
    Huang, Yulan
    Scherer, Sebastian
    2017 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2017, : 590 - 597