SOccDPT: 3D Semantic Occupancy From Dense Prediction Transformers Trained Under Memory Constraints

被引:0
|
作者
Ganesh, Aditya Nalgunda [1 ]
机构
[1] PES Univ, Dept Comp Sci, Bengaluru, Karnataka, India
关键词
3D Vision; Semantic occupancy; Depth perception; Occupancy network;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present SOccDPT, a memory-efficient approach for 3D semantic occupancy prediction from monocular image input using dense prediction transformers. To address the limitations of existing methods trained on structured traffic datasets, we train our model on unstructured datasets including the Indian Driving Dataset and Bengaluru Driving Dataset. Our semi- supervised training pipeline allows SOccDPT to learn from datasets with limited labels by reducing the requirement for manual labeling by substituting it with pseudo-ground truth labels to produce our Bengaluru Semantic Occupancy Dataset. This broader training enhances our model's ability to handle unstructured traffic scenarios effectively. To overcome memory limitations during training, we introduce patch-wise training where we select a subset of parameters to train each epoch, reducing memory usage during auto-grad graph construction. In the context of unstructured traffic and memory-constrained training and inference, SOccDPT outperforms existing disparity estimation approaches as shown by the RMSE score of 9.1473, achieves a semantic segmentation IoU score of 46.02% and operates at a competitive frequency of 69.47 Hz. We make our code and semantic occupancy dataset public(1).
引用
收藏
页码:2201 / 2212
页数:12
相关论文
共 50 条
  • [1] LinkOcc: 3D Semantic Occupancy Prediction With Temporal Association
    Ouyang, Wenzhe
    Xu, Zenglin
    Shen, Bin
    Wang, Jinghua
    Xu, Yong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (02) : 1374 - 1384
  • [2] Vision Transformers: From Semantic Segmentation to Dense Prediction
    Zhang, Li
    Lu, Jiachen
    Zheng, Sixiao
    Zhao, Xinxuan
    Zhu, Xiatian
    Fu, Yanwei
    Xiang, Tao
    Feng, Jianfeng
    Torr, Philip H. S.
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (12) : 6142 - 6162
  • [3] Dense Semantic 3D Reconstruction
    Hane, Christian
    Zach, Christopher
    Cohen, Andrea
    Pollefeys, Marc
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (09) : 1730 - 1743
  • [4] GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction
    Huang, Yuanhui
    Zheng, Wenzhao
    Zhang, Yunpeng
    Zhou, Jie
    Lu, Jiwen
    COMPUTER VISION - ECCV 2024, PT XXVII, 2025, 15085 : 376 - 393
  • [5] Real-time 3D semantic occupancy prediction for autonomous vehicles using memory-efficient sparse convolution
    Sze, Samuel
    Kunze, Lars
    2024 35TH IEEE INTELLIGENT VEHICLES SYMPOSIUM, IEEE IV 2024, 2024, : 1286 - 1293
  • [6] Real-time Semantic 3D Dense Occupancy Mapping with Efficient Free Space Representations
    Zhong, Yuanxin
    Peng, Huei
    2022 IEEE 25TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC), 2022, : 230 - 236
  • [7] Fully Sparse 3D Occupancy Prediction
    Liu, Haisong
    Chen, Yang
    Wang, Haiguang
    Yang, Zetong
    Li, Tianyu
    Zeng, Jia
    Chen, Li
    Li, Hongyang
    Wang, Limin
    COMPUTER VISION - ECCV 2024, PT XXV, 2025, 15083 : 54 - 71
  • [8] Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction
    Huang, Yuanhui
    Zheng, Wenzhao
    Zhang, Yunpeng
    Zhou, Jie
    Lu, Jiwen
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 9223 - 9232
  • [9] Semantic Segmentation on 3D Occupancy Grids for Automotive Radar
    Prophet, Robert
    Deligiannis, Anastasios
    Fuentes-Michel, Juan-Carlos
    Weber, Ingo
    Vossiek, Martin
    IEEE ACCESS, 2020, 8 : 197917 - 197930
  • [10] Self-Supervised 3D Semantic Occupancy Prediction from Multi-View 2D Surround Images
    Abualhanud, S.
    Erahan, E.
    Mehltretter, M.
    PFG-JOURNAL OF PHOTOGRAMMETRY REMOTE SENSING AND GEOINFORMATION SCIENCE, 2024, 92 (05): : 483 - 498