HybridOcc: NeRF Enhanced Transformer-Based Multi-Camera 3D Occupancy Prediction

被引:1
|
作者
Zhao, Xiao [1 ]
Chen, Bo [2 ]
Sun, Mingyang [1 ]
Yang, Dingkang [1 ]
Wang, Youxing [1 ,2 ]
Zhang, Xukun [1 ]
Li, Mingcheng [1 ]
Kou, Dongliang [1 ]
Wei, Xiaoyi [1 ]
Zhang, Lihua [1 ,3 ,4 ]
机构
[1] Fudan Univ, Acad Engn & Technol, Shanghai 200000, Peoples R China
[2] China FAW Grp Corp Ltd, Nanjing 211102, Peoples R China
[3] Minist Educ, Engn Res Ctr AI & Robot, Shanghai 200000, Peoples R China
[4] Jilin Prov Key Lab Intelligence Sci & Engn, Changchun 130000, Peoples R China
来源
基金
国家重点研发计划;
关键词
Computer vision; autonomous driving; neural networks; semantic scene completion; 3D occupancy;
D O I
10.1109/LRA.2024.3416798
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Vision-based 3D semantic scene completion (SSC) describes autonomous driving scenes through 3D volume representations. However, the occlusion of invisible voxels by scene surfaces poses challenges to current SSC methods in hallucinating refined 3D geometry. This letter proposes HybridOcc, a hybrid 3D volume query proposal method generated by Transformer framework and NeRF representation and refined in a coarse-to-fine SSC prediction framework. HybridOcc aggregates contextual features through the Transformer paradigm based on hybrid query proposals while combining it with NeRF representation to obtain depth supervision. The Transformer branch contains multiple scales and uses spatial cross-attention for 2D to 3D transformation. The newly designed NeRF branch implicitly infers scene occupancy through volume rendering, including visible and invisible voxels, and explicitly captures scene depth rather than generating RGB color. Furthermore, we present an innovative occupancy-aware ray sampling method to orient the SSC task instead of focusing on the scene surface, further improving the overall performance. Extensive experiments on nuScenes and SemanticKITTI datasets demonstrate the effectiveness of our HybridOcc on the SSC task.
引用
收藏
页码:7867 / 7874
页数:8
相关论文
共 50 条
  • [21] Demo: Real-time 3D visualization of multi-camera room occupancy monitoring for immersive communication systems
    Demeulemeester, Aljosha
    Hollemeersch, Charles-Frederik
    Lambert, Peter
    Van de Walle, Rik
    Jelaca, Vedran
    Gruenwedel, Sebastian
    Nino, Jorge
    Van Cauwelaert, Dimitri
    Veelaert, Peter
    Van Hese, Peter
    Philips, Wilfried
    2011 FIFTH ACM/IEEE INTERNATIONAL CONFERENCE ON DISTRIBUTED SMART CAMERAS (ICDSC), 2011,
  • [22] Multi-camera Sports Players 3D Localization with Identification Reasoning
    Yang, Yukun
    Zhang, Ruiheng
    Wu, Wanneng
    Peng, Yu
    Xu, Min
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 4497 - 4504
  • [23] HUMAN DETECTION USING MULTI-CAMERA AND 3D SCENE KNOWLEDGE
    Zeng, Chengbin
    Ma, Huadong
    2011 18TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2011, : 1793 - 1796
  • [24] The Cluster Approach Applied to Multi-Camera 3D DIC System
    Siebert, Thorsten
    Splitthof, Karsten
    Lomnitz, Marek
    ADVANCEMENT OF OPTICAL METHODS IN EXPERIMENTAL MECHANICS, VOL 3, 2017, : 157 - 163
  • [25] MatrixVT: Efficient Multi-Camera to BEV Transformation for 3D Perception
    Zhou, Hongyu
    Ge, Zheng
    Li, Zeming
    Zhang, Xiangyu
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 8514 - 8523
  • [26] Multi-camera 3D ball tracking framework for sports video
    Wu, Wanneng
    Xu, Min
    Liang, Qiaokang
    Mei, Li
    Peng, Yu
    IET IMAGE PROCESSING, 2020, 14 (15) : 3751 - 3761
  • [27] 3D reconstruction of a compressible flow by synchronized multi-camera BOS
    Nicolas, F.
    Donjat, D.
    Leon, O.
    Le Besnerais, G.
    Champagnat, F.
    Micheli, F.
    EXPERIMENTS IN FLUIDS, 2017, 58 (05)
  • [28] A new metrological characterization strategy for 3D multi-camera systems
    Michaela Servi
    Francesco Buonamici
    Luca Puggelli
    Yary Volpe
    International Journal on Interactive Design and Manufacturing (IJIDeM), 2021, 15 : 69 - 72
  • [29] RetryTRACK: Recovering Misses in Multi-Camera 3D Pedestrian Tracking
    de Andrade, Isabella
    Lima, Joao Paulo
    Teichrieb, Veronica
    2024 37TH SIBGRAPI CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES, SIBGRAPI 2024, 2024, : 145 - 150
  • [30] 3D reconstruction of a compressible flow by synchronized multi-camera BOS
    F. Nicolas
    D. Donjat
    O. Léon
    G. Le Besnerais
    F. Champagnat
    F. Micheli
    Experiments in Fluids, 2017, 58