HybridOcc: NeRF Enhanced Transformer-Based Multi-Camera 3D Occupancy Prediction

被引:1
|
作者
Zhao, Xiao [1 ]
Chen, Bo [2 ]
Sun, Mingyang [1 ]
Yang, Dingkang [1 ]
Wang, Youxing [1 ,2 ]
Zhang, Xukun [1 ]
Li, Mingcheng [1 ]
Kou, Dongliang [1 ]
Wei, Xiaoyi [1 ]
Zhang, Lihua [1 ,3 ,4 ]
机构
[1] Fudan Univ, Acad Engn & Technol, Shanghai 200000, Peoples R China
[2] China FAW Grp Corp Ltd, Nanjing 211102, Peoples R China
[3] Minist Educ, Engn Res Ctr AI & Robot, Shanghai 200000, Peoples R China
[4] Jilin Prov Key Lab Intelligence Sci & Engn, Changchun 130000, Peoples R China
来源
基金
国家重点研发计划;
关键词
Computer vision; autonomous driving; neural networks; semantic scene completion; 3D occupancy;
D O I
10.1109/LRA.2024.3416798
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Vision-based 3D semantic scene completion (SSC) describes autonomous driving scenes through 3D volume representations. However, the occlusion of invisible voxels by scene surfaces poses challenges to current SSC methods in hallucinating refined 3D geometry. This letter proposes HybridOcc, a hybrid 3D volume query proposal method generated by Transformer framework and NeRF representation and refined in a coarse-to-fine SSC prediction framework. HybridOcc aggregates contextual features through the Transformer paradigm based on hybrid query proposals while combining it with NeRF representation to obtain depth supervision. The Transformer branch contains multiple scales and uses spatial cross-attention for 2D to 3D transformation. The newly designed NeRF branch implicitly infers scene occupancy through volume rendering, including visible and invisible voxels, and explicitly captures scene depth rather than generating RGB color. Furthermore, we present an innovative occupancy-aware ray sampling method to orient the SSC task instead of focusing on the scene surface, further improving the overall performance. Extensive experiments on nuScenes and SemanticKITTI datasets demonstrate the effectiveness of our HybridOcc on the SSC task.
引用
收藏
页码:7867 / 7874
页数:8
相关论文
共 50 条
  • [1] SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving
    Wei, Yi
    Zhao, Linqing
    Zheng, Wenzhao
    Zhu, Zheng
    Zhou, Jie
    Lu, Jiwen
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 21672 - 21683
  • [2] 3DPPE: 3D Point Positional Encoding for Transformer-based Multi-Camera 3D Object Detection
    Shu, Changyong
    Deng, Jiajun
    Yu, Fisher
    Liu, Yifan
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 3557 - 3566
  • [3] AdaptiveOcc: Adaptive Octree-Based Network for Multi-Camera 3D Semantic Occupancy Prediction in Autonomous Driving
    Yang, Tianyu
    Qian, Yeqiang
    Yan, Weihao
    Wang, Chunxiang
    Yang, Ming
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (03) : 2173 - 2187
  • [4] PolarFormer: Multi-Camera 3D Object Detection with Polar Transformer
    Jiang, Yanqin
    Zhang, Li
    Miao, Zhenwei
    Zhu, Xiatian
    Gao, Jin
    Hu, Weimin
    Jiang, Yu-Gang
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 1, 2023, : 1042 - 1050
  • [5] COTR: Compact Occupancy TRansformer for Vision-based 3D Occupancy Prediction
    Ma, Qihang
    Tan, Xin
    Qu, Yanyun
    Ma, Lizhuang
    Zhang, Zhizhong
    Xie, Yuan
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 19936 - 19945
  • [6] Generalizable Multi-Camera 3D Pedestrian Detection
    Lima, Joao Paulo
    Roberto, Rafael
    Figueiredo, Lucas
    Simoes, Francisco
    Teichrieb, Veronica
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 1232 - 1240
  • [7] Calibrating a multi-camera system for 3D modelling
    Wiles, C
    Davison, A
    IEEE WORKSHOP ON MULTI-VIEW MODELING & ANALYSIS OF VISUAL SCENES (MVIEW'99). PROCEEDINGS, 1999, : 29 - 36
  • [8] Multi-camera system for 3D forensic documentation
    Leipner, Anja
    Baumeister, Rilana
    Thali, Michael J.
    Braun, Marcel
    Dobler, Erika
    Ebert, Lars C.
    FORENSIC SCIENCE INTERNATIONAL, 2016, 261 : 123 - 128
  • [9] TransCAR: Transformer-based Camera-And-Radar Fusion for 3D Object Detection
    Pang, Su
    Morris, Daniel
    Radha, Hayder
    2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2023, : 10902 - 10909
  • [10] 3D Head Reconstruction using Multi-camera Stream
    Kim, Donghoon
    Dahyot, Rozenn
    2009 13TH INTERNATIONAL MACHINE VISION AND IMAGE PROCESSING CONFERENCE, 2009, : 156 - 161