A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future

被引:2
|
作者
Zhu, Chaoyang [1 ]
Chen, Long [1 ]
机构
[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Kowloon, Hong Kong, Peoples R China
关键词
Open-vocabulary; zero-shot learning; object detection; image segmentation; future directions; OBJECT; LANGUAGE;
D O I
10.1109/TPAMI.2024.3413013
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As the most fundamental scene understanding tasks, object detection and segmentation have made tremendous progress in deep learning era. Due to the expensive manual labeling cost, the annotated categories in existing datasets are often small-scale and pre-defined, i.e., state-of-the-art fully-supervised detectors and segmentors fail to generalize beyond the closed vocabulary. To resolve this limitation, in the last few years, the community has witnessed an increasing attention toward Open-Vocabulary Detection (OVD) and Segmentation (OVS). By "open-vocabulary", we mean that the models can classify objects beyond pre-defined categories. In this survey, we provide a comprehensive review on recent developments of OVD and OVS. A taxonomy is first developed to organize different tasks and methodologies. We find that the permission and usage of weak supervision signals can well discriminate different methodologies, including: visual-semantic space mapping, novel visual feature synthesis, region-aware training, pseudo-labeling, knowledge distillation, and transfer learning. The proposed taxonomy is universal across different tasks, covering object detection, semantic/instance/panoptic segmentation, 3D and video understanding. The main design principles, key challenges, development routes, methodology strengths, and weaknesses are thoroughly analyzed.
引用
收藏
页码:8954 / 8975
页数:22
相关论文
共 50 条
  • [11] Towards Open-Vocabulary Video Instance Segmentation
    Wang, Haochen
    Yan, Cilin
    Wang, Shuai
    Jiang, Xiaolong
    Tang, Xu
    Hu, Yao
    Xie, Weidi
    Gavves, Efstratios
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 4034 - 4043
  • [12] MasQCLIP for Open-Vocabulary Universal Image Segmentation
    Xu, Xin
    Xiong, Tianyi
    Ding, Zheng
    Tu, Zhuowen
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 887 - 898
  • [13] Hierarchical Open-vocabulary Universal Image Segmentation
    Wang, Xudong
    Li, Shufan
    Kallidromitis, Konstantinos
    Kato, Yusuke
    Kozuka, Kazuki
    Darrell, Trevor
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [14] Scaling Open-Vocabulary Object Detection
    Minderer, Matthias
    Gritsenko, Alexey
    Houlsby, Neil
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [15] Simple Open-Vocabulary Object Detection
    Minderer, Matthias
    Gritsenko, Alexey
    Stone, Austin
    Neumann, Maxim
    Weissenborn, Dirk
    Dosovitskiy, Alexey
    Mahendran, Aravindh
    Arnab, Anurag
    Dehghani, Mostafa
    Shen, Zhuoran
    Wang, Xiao
    Zhai, Xiaohua
    Kipf, Thomas
    Houlsby, Neil
    COMPUTER VISION, ECCV 2022, PT X, 2022, 13670 : 728 - 755
  • [16] Side Adapter Network for Open-Vocabulary Semantic Segmentation
    Xu, Mengde
    Zhang, Zheng
    Wei, Fangyun
    Hu, Han
    Bai, Xiang
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2945 - 2954
  • [17] FreeSeg: Unified, Universal and Open-Vocabulary Image Segmentation
    Qin, Jie
    Wu, Jie
    Yan, Pengxiang
    Li, Ming
    Ren Yuxi
    Xiao, Xuefeng
    Wang, Yitong
    Wang, Rui
    Wen, Shilei
    Pan, Xin
    Wang, Xingang
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 19446 - 19455
  • [18] Open-Vocabulary RGB-Thermal Semantic Segmentation
    Zhao, Guoqiang
    Huang, Junjie
    Yan, Xiaoyun
    Wang, Zhaojing
    Tang, Junwei
    Ou, Yangjun
    Hu, Xinrong
    Peng, Tao
    COMPUTER VISION - ECCV 2024, PT LXXIV, 2025, 15132 : 304 - 320
  • [19] Open-Vocabulary Segmentation with Semantic-Assisted Calibration
    Liu, Yong
    Bai, Sule
    Li, Guanbin
    Wang, Yitong
    Tang, Yansong
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 3491 - 3500
  • [20] Enhancing Open-Vocabulary Semantic Segmentation with Prototype Retrieval
    Barsellotti, Luca
    Amoroso, Roberto
    Baraldi, Lorenzo
    Cucchiara, Rita
    IMAGE ANALYSIS AND PROCESSING, ICIAP 2023, PT II, 2023, 14234 : 196 - 208