A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future

被引:2
|
作者
Zhu, Chaoyang [1 ]
Chen, Long [1 ]
机构
[1] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Kowloon, Hong Kong, Peoples R China
关键词
Open-vocabulary; zero-shot learning; object detection; image segmentation; future directions; OBJECT; LANGUAGE;
D O I
10.1109/TPAMI.2024.3413013
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As the most fundamental scene understanding tasks, object detection and segmentation have made tremendous progress in deep learning era. Due to the expensive manual labeling cost, the annotated categories in existing datasets are often small-scale and pre-defined, i.e., state-of-the-art fully-supervised detectors and segmentors fail to generalize beyond the closed vocabulary. To resolve this limitation, in the last few years, the community has witnessed an increasing attention toward Open-Vocabulary Detection (OVD) and Segmentation (OVS). By "open-vocabulary", we mean that the models can classify objects beyond pre-defined categories. In this survey, we provide a comprehensive review on recent developments of OVD and OVS. A taxonomy is first developed to organize different tasks and methodologies. We find that the permission and usage of weak supervision signals can well discriminate different methodologies, including: visual-semantic space mapping, novel visual feature synthesis, region-aware training, pseudo-labeling, knowledge distillation, and transfer learning. The proposed taxonomy is universal across different tasks, covering object detection, semantic/instance/panoptic segmentation, 3D and video understanding. The main design principles, key challenges, development routes, methodology strengths, and weaknesses are thoroughly analyzed.
引用
收藏
页码:8954 / 8975
页数:22
相关论文
共 50 条
  • [31] Unified Embedding Alignment for Open-Vocabulary Video Instance Segmentation
    Fang, Hao
    Wu, Peng
    Li, Yawei
    Zhang, Xinxin
    Lu, Xiankai
    COMPUTER VISION - ECCV 2024, PT LXX, 2025, 15128 : 225 - 241
  • [32] OV-VIS: Open-Vocabulary Video Instance Segmentation
    Wang, Haochen
    Yan, Cilin
    Chen, Keyan
    Jiang, Xiaolong
    Tang, Xu
    Hu, Yao
    Kang, Guoliang
    Xie, Weidi
    Gavves, Efstratios
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (11) : 5048 - 5065
  • [33] Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP
    Liang, Feng
    Wu, Bichen
    Dai, Xiaoliang
    Li, Kunpeng
    Zhao, Yinan
    Zhang, Hang
    Zhang, Peizhao
    Vajda, Peter
    Marculescu, Diana
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 7061 - 7070
  • [34] SAN: Side Adapter Network for Open-Vocabulary Semantic Segmentation
    Xu, Mengde
    Zhang, Zheng
    Wei, Fangyun
    Hu, Han
    Bai, Xiang
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (12) : 15546 - 15561
  • [35] Image-text aggregation for open-vocabulary semantic segmentation
    Cheng, Shengyang
    Huang, Jianyong
    Wang, Xiaodong
    Huang, Lei
    Wei, Zhiqiang
    NEUROCOMPUTING, 2025, 630
  • [36] Expanding the Horizons: Exploring Further Steps in Open-Vocabulary Segmentation
    Wang, Xihua
    Ji, Lei
    Yan, Kun
    Sun, Yuchong
    Song, Ruihua
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT X, 2024, 14434 : 407 - 419
  • [37] OV-PARTS: Towards Open-Vocabulary Part Segmentation
    Wei, Meng
    Yue, Xiaoyu
    Zhang, Wenwei
    Kong, Shu
    Liu, Xihui
    Pang, Jiangmiao
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [38] TAG: Guidance-Free Open-Vocabulary Semantic Segmentation
    Kawano, Yasufumi
    Aoki, Yoshimitsu
    IEEE ACCESS, 2024, 12 : 88322 - 88331
  • [39] LLMFormer: Large Language Model for Open-Vocabulary Semantic Segmentation
    Shi, Hengcan
    Dao, Son Duy
    Cai, Jianfei
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025, 133 (02) : 742 - 759
  • [40] Aligning Bag of Regions for Open-Vocabulary Object Detection
    Wu, Size
    Zhang, Wenwei
    Jin, Sheng
    Liu, Wentao
    Loy, Chen Change
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 15254 - 15264