Towards Domain Generalization for Multi-view 3D Object Detection in Bird-Eye-View

被引:10
|
作者
Wang, Shuo [1 ]
Zhao, Xinhai [2 ]
Xu, Hai-Ming [3 ]
Chen, Zehui [1 ]
Yu, Dameng [2 ]
Chang, Jiahao [1 ]
Yang, Zhen [2 ]
Zhao, Feng [1 ]
机构
[1] Univ Sci & Technol China, Hefei, Peoples R China
[2] Huawei Noahs Ark Lab, Montreal, PQ, Canada
[3] Univ Adelaide, Adelaide, SA, Australia
关键词
D O I
10.1109/CVPR52729.2023.01281
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-view 3D object detection (MV3D-Det) in Bird-Eye-View (BEV) has drawn extensive attention due to its low cost and high efficiency. Although new algorithms for camera-only 3D object detection have been continuously proposed, most of them may risk drastic performance degradation when the domain of input images differs from that of training. In this paper, we first analyze the causes of the domain gap for the MV3D-Det task. Based on the covariate shift assumption, we find that the gap mainly attributes to the feature distribution of BEV, which is determined by the quality of both depth estimation and 2D image's feature representation. To acquire a robust depth prediction, we propose to decouple the depth estimation from the intrinsic parameters of the camera (i.e. the focal length) through converting the prediction of metric depth to that of scale-invariant depth and perform dynamic perspective augmentation to increase the diversity of the extrinsic parameters (i.e. the camera poses) by utilizing homography. Moreover, we modify the focal length values to create multiple pseudo-domains and construct an adversarial training loss to encourage the feature representation to be more domain-agnostic. Without bells and whistles, our approach, namely DG-BEV, successfully alleviates the performance drop on the unseen target domain without impairing the accuracy of the source domain. Extensive experiments on Waymo, nuScenes, and Lyft, demonstrate the generalization and effectiveness of our approach.
引用
收藏
页码:13333 / 13342
页数:10
相关论文
共 50 条
  • [1] Efficient and robust multi-camera 3D object detection in bird-eye-view
    Wang, Yuanlong
    Jiang, Hengtao
    Chen, Guanying
    Zhang, Tong
    Zhou, Jiaqing
    Qing, Zezheng
    Wang, Chunyan
    Zhao, Wanzhong
    IMAGE AND VISION COMPUTING, 2025, 154
  • [2] Lifting 2D Object Detection to 3D: Geometric Approach in Bird-Eye-View
    Zhuravlev, Dmitriy
    ARTIFICIAL INTELLIGENCE TRENDS IN SYSTEMS, VOL 2, 2022, 502 : 211 - 225
  • [3] Multi-View Attentive Contextualization for Multi-View 3D Object Detection
    Liu, Xianpeng
    Zheng, Ce
    Qian, Ming
    Xue, Nan
    Chen, Chen
    Zhang, Zhebin
    Li, Chen
    Wu, Tianfu
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 16688 - 16698
  • [4] Point-Voxel and Bird-Eye-View Representation Aggregation Network for Single Stage 3D Object Detection
    Ning, Kanglin
    Liu, Yanfei
    Su, Yanzhao
    Jiang, Ke
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2023, 24 (03) : 3223 - 3235
  • [5] Viewpoint Equivariance for Multi-View 3D Object Detection
    Chen, Dian
    Li, Jie
    Guizilini, Vitor
    Ambrus, Rares
    Gaidon, Adrien
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 9213 - 9222
  • [6] CAPE: Camera View Position Embedding for Multi-View 3D Object Detection
    Xiong, Kaixin
    Gong, Shi
    Ye, Xiaoqing
    Tan, Xiao
    Wan, Ji
    Ding, Errui
    Wang, Jingdong
    Bai, Xiang
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 21570 - 21579
  • [7] SA-BEV: Generating Semantic-Aware Bird's-Eye-View Feature for Multi-view 3D Object Detection
    Zhang, Jinqing
    Zhang, Yanan
    Liu, Qingjie
    Wang, Yunhong
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 3325 - 3334
  • [8] Multi-View 3D Object Detection Network for Autonomous Driving
    Chen, Xiaozhi
    Ma, Huimin
    Wan, Ji
    Li, Bo
    Xia, Tian
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6526 - 6534
  • [9] Multi-View Object Class Detection with a 3D Geometric Model
    Liebelt, Joerg
    Schmid, Cordelia
    2010 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2010, : 1688 - 1695
  • [10] ARC-BEV: Attentive Radar-Camera Fusion 3D Object Detection in Bird-Eye-View Space for Autonomous Driving
    Shen, Lyuyu
    Li, Jianghao
    Lee, Christina Dao Wen
    Lee, Min Young
    Hartmannsgruber, Andreas
    Ang, Marcelo H., Jr.
    EXPERIMENTAL ROBOTICS, ISER 2023, 2024, 30 : 557 - 566