A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future

被引:2
|
作者
Zhu, Chaoyang [1 ]
Chen, Long [1 ]
机构
[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Kowloon, Hong Kong, Peoples R China
关键词
Open-vocabulary; zero-shot learning; object detection; image segmentation; future directions; OBJECT; LANGUAGE;
D O I
10.1109/TPAMI.2024.3413013
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As the most fundamental scene understanding tasks, object detection and segmentation have made tremendous progress in deep learning era. Due to the expensive manual labeling cost, the annotated categories in existing datasets are often small-scale and pre-defined, i.e., state-of-the-art fully-supervised detectors and segmentors fail to generalize beyond the closed vocabulary. To resolve this limitation, in the last few years, the community has witnessed an increasing attention toward Open-Vocabulary Detection (OVD) and Segmentation (OVS). By "open-vocabulary", we mean that the models can classify objects beyond pre-defined categories. In this survey, we provide a comprehensive review on recent developments of OVD and OVS. A taxonomy is first developed to organize different tasks and methodologies. We find that the permission and usage of weak supervision signals can well discriminate different methodologies, including: visual-semantic space mapping, novel visual feature synthesis, region-aware training, pseudo-labeling, knowledge distillation, and transfer learning. The proposed taxonomy is universal across different tasks, covering object detection, semantic/instance/panoptic segmentation, 3D and video understanding. The main design principles, key challenges, development routes, methodology strengths, and weaknesses are thoroughly analyzed.
引用
收藏
页码:8954 / 8975
页数:22
相关论文
共 50 条
  • [1] A Simple Framework for Open-Vocabulary Segmentation and Detection
    Zhang, Hao
    Li, Feng
    Zou, Xueyan
    Liu, Shilong
    Li, Chunyuan
    Yang, Jianwei
    Zhang, Lei
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 1020 - 1031
  • [2] Open-Vocabulary And Multitask Image Segmentation
    Pan, Lihu
    Yang, Yunting
    Wang, Zhengkui
    Shan, Wen
    Yin, Jaili
    39TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2024, 2024, : 1048 - 1049
  • [3] Diffusion Models for Open-Vocabulary Segmentation
    Karazija, Laurynas
    Laina, Iro
    Vedaldi, Andrea
    Rupprecht, Christian
    COMPUTER VISION - ECCV 2024, PT V, 2025, 15063 : 299 - 317
  • [4] Open-Vocabulary Camouflaged Object Segmentation
    Pang, Youwei
    Zhao, Xiaoqi
    Zuo, Jiaming
    Zhang, Lihe
    Lu, Huchuan
    COMPUTER VISION - ECCV 2024, PT XLVII, 2025, 15105 : 476 - 495
  • [5] Open-vocabulary Attribute Detection
    Bravo, Maria A.
    Mittal, Sudhanshu
    Ging, Simon
    Brox, Thomas
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 7041 - 7050
  • [6] Open-vocabulary Panoptic Segmentation with Embedding Modulation
    Chen, Xi
    Li, Shuang
    Lim, Ser-Nam
    Torralba, Antonio
    Zhao, Hengshuang
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 1141 - 1150
  • [7] Generalization Boosted Adapter for Open-Vocabulary Segmentation
    Xu, Wenhao
    Wang, Changwei
    Feng, Xuxiang
    Xu, Rongtao
    Huang, Longzhao
    Zhang, Zherui
    Guo, Li
    Xu, Shibiao
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (01) : 520 - 533
  • [8] Open-vocabulary Object Segmentation with Diffusion Models
    Li, Ziyi
    Zhou, Qinye
    Zhang, Xiaoyun
    Zhang, Ya
    Wang, Yanfeng
    Xie, Weidi
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 7633 - 7642
  • [9] Open-Vocabulary Object Detection With an Open Corpus
    Wang, Jiong
    Zhang, Huiming
    Hong, Haiwen
    Jin, Xuan
    He, Yuan
    Xue, Hui
    Zhao, Zhou
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 6736 - 6746
  • [10] Going Denser with Open-Vocabulary Part Segmentation
    Sun, Peize
    Chen, Shoufa
    Zhu, Chenchen
    Xiao, Fanyi
    Luo, Ping
    Xie, Saining
    Yan, Zhicheng
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15407 - 15419