OpenSight: A Simple Open-Vocabulary Framework for LiDAR-Based Object Detection

被引:0
|
作者
Zhang, Hu [1 ]
Ku, Jianhua [4 ]
Tang, Tao [5 ]
Sun, Haiyang [6 ]
Huang, Xin [2 ]
Huang, Zi [2 ]
Yu, Kaicheng [3 ]
机构
[1] CSIRO DATA61, Sydney, NSW, Australia
[2] Univ Queensland, Brisbane, Qld, Australia
[3] Westlake Univ, Hangzhou, Peoples R China
[4] Alibaba, DAMO Acad, Beijing, Peoples R China
[5] Sun Yat Sen Univ, Shenzhen Campus, Shenzhen, Peoples R China
[6] LiAuto Inc, Beijing, Peoples R China
来源
关键词
OpenSight; Open-vocabulary; 3D object detection; VOXELNET;
D O I
10.1007/978-3-031-72907-2_1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Traditional LiDAR-based object detection research primarily focuses on closed-set scenarios, which falls short in complex real-world applications. Directly transferring existing 2D open-vocabulary models with some known LiDAR classes for open-vocabulary ability, however, tends to suffer from over-fitting problems: The obtained model will detect the known objects, even presented with a novel category. In this paper, we propose OpenSight, a more advanced 2D-3D modeling framework for LiDAR-based open-vocabulary detection. OpenSight utilizes 2D-3D geometric priors for the initial discernment and localization of generic objects, followed by a more specific semantic interpretation of the detected objects. The process begins by generating 2D boxes for generic objects from the accompanying camera images of LiDAR. These 2D boxes, together with LiDAR points, are then lifted back into the LiDAR space to estimate corresponding 3D boxes. For better generic object perception, our framework integrates both temporal and spatial-aware constraints. Temporal awareness correlates the predicted 3D boxes across consecutive timestamps, recalibrating the missed or inaccurate boxes. The spatial awareness randomly places some "precisely" estimated 3D boxes at varying distances, increasing the visibility of generic objects. To interpret the specific semantics of detected objects, we develop a cross-modal alignment and fusion module to first align 3D features with 2D image embeddings and then fuse the aligned 3D-2D features for semantic decoding. Our experiments indicate that our method establishes state-of-the-art open-vocabulary performance on widely used 3D detection benchmarks and effectively identifies objects for new categories of interest.
引用
收藏
页码:1 / 19
页数:19
相关论文
共 50 条
  • [1] Simple Open-Vocabulary Object Detection
    Minderer, Matthias
    Gritsenko, Alexey
    Stone, Austin
    Neumann, Maxim
    Weissenborn, Dirk
    Dosovitskiy, Alexey
    Mahendran, Aravindh
    Arnab, Anurag
    Dehghani, Mostafa
    Shen, Zhuoran
    Wang, Xiao
    Zhai, Xiaohua
    Kipf, Thomas
    Houlsby, Neil
    COMPUTER VISION, ECCV 2022, PT X, 2022, 13670 : 728 - 755
  • [2] A Simple Framework for Open-Vocabulary Segmentation and Detection
    Zhang, Hao
    Li, Feng
    Zou, Xueyan
    Liu, Shilong
    Li, Chunyuan
    Yang, Jianwei
    Zhang, Lei
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 1020 - 1031
  • [3] Global-Local Collaborative Inference with LLM for Lidar-Based Open-Vocabulary Detection
    Peng, Xingyu
    Bai, Yan
    Gao, Chen
    Yang, Lirong
    Xia, Fei
    Mu, Beipeng
    Wang, Xiaofei
    Liu, Si
    COMPUTER VISION-ECCV 2024, PT XXXV, 2025, 15093 : 367 - 384
  • [4] Open-Vocabulary Object Detection With an Open Corpus
    Wang, Jiong
    Zhang, Huiming
    Hong, Haiwen
    Jin, Xuan
    He, Yuan
    Xue, Hui
    Zhao, Zhou
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 6736 - 6746
  • [5] Scaling Open-Vocabulary Object Detection
    Minderer, Matthias
    Gritsenko, Alexey
    Houlsby, Neil
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [6] Open-Vocabulary Object Detection Using Captions
    Zareian, Alireza
    Dela Rosa, Kevin
    Hu, Derek Hao
    Chang, Shih-Fu
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 14388 - 14397
  • [7] Weakly Supervised Open-Vocabulary Object Detection
    Lin, Jianghang
    Shen, Yunhang
    Wang, Bingquan
    Lin, Shaohui
    Li, Ke
    Cao, Liujuan
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 4, 2024, : 3404 - 3412
  • [8] Simple Image-Level Classification Improves Open-Vocabulary Object Detection
    Fang, Ruohuan
    Pang, Guansong
    Bai, Xiao
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 2, 2024, : 1716 - 1725
  • [9] Aligning Bag of Regions for Open-Vocabulary Object Detection
    Wu, Size
    Zhang, Wenwei
    Jin, Sheng
    Liu, Wentao
    Loy, Chen Change
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 15254 - 15264
  • [10] Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection
    Wang, Luting
    Liu, Yi
    Du, Penghui
    Ding, Zihan
    Liao, Yue
    Qi, Qiaosong
    Chen, Biaolong
    Liu, Si
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 11186 - 11196