Single-stage zero-shot object detection network based on CLIP and pseudo-labeling

被引:2
|
作者
Li, Jiafeng [1 ,2 ]
Sun, Shengyao [1 ,2 ]
Zhang, Kang [1 ,2 ]
Zhang, Jing [1 ,2 ]
Zhuo, Li [1 ,2 ]
机构
[1] Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China
[2] Beijing Univ Technol, Beijing Key Lab Computat Intelligence & Intelligen, Beijing 100124, Peoples R China
基金
中国国家自然科学基金; 北京市自然科学基金;
关键词
Zero-shot detection; Single-stage; CLIP; Pseudo-labeling;
D O I
10.1007/s13042-024-02321-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The detection of unknown objects is a challenging task in computer vision because, although there are diverse real-world detection object categories, existing object-detection training sets cover a limited number of object categories . Most existing approaches use two-stage networks to improve a model's ability to characterize objects of unknown classes, which leads to slow inference. To address this issue, we proposed a single-stage unknown object detection method based on the contrastive language-image pre-training (CLIP) model and pseudo-labelling, called CLIP-YOLO. First, a visual language embedding alignment method is introduced and a channel-grouped enhanced coordinate attention module is embedded into a YOLO-series detection head and feature-enhancing component, to improve the model's ability to characterize and detect unknown category objects. Second, the pseudo-labelling generation is optimized based on the CLIP model to expand the diversity of the training set and enhance the ability to cover unknown object categories. We validated this method on four challenging datasets: MSCOCO, ILSVRC, Visual Genome, and PASCAL VOC. The results show that our method can achieve higher accuracy and faster speed, so as to obtain better performance of unknown object detection. The source code is available at https://github.com/BJUTsipl/CLIP-YOLO.
引用
收藏
页码:1055 / 1070
页数:16
相关论文
共 50 条
  • [1] Zero-Shot Object Detection
    Bansal, Ankan
    Sikka, Karan
    Sharma, Gaurav
    Chellappa, Rama
    Divakaran, Ajay
    COMPUTER VISION - ECCV 2018, PT I, 2018, 11205 : 397 - 414
  • [2] Zero-shot object detection with contrastive semantic association network
    Haohe Li
    Chong Wang
    Weijie Liu
    Yilin Gong
    Xinmiao Dai
    Applied Intelligence, 2023, 53 : 30056 - 30068
  • [3] ZERO-SHOT OBJECT DETECTION WITH TRANSFORMERS
    Zheng, Ye
    Cui, Li
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 444 - 448
  • [4] A Survey of Zero-Shot Object Detection
    Cao, Weipeng
    Yao, Xuyang
    Xu, Zhiwu
    Liu, Ye
    Pan, Yinghui
    Ming, Zhong
    BIG DATA MINING AND ANALYTICS, 2025, 8 (03): : 726 - 750
  • [5] Zero-shot object detection with contrastive semantic association network
    Li, Haohe
    Wang, Chong
    Liu, Weijie
    Gong, Yilin
    Dai, Xinmiao
    APPLIED INTELLIGENCE, 2023, 53 (24) : 30056 - 30068
  • [6] GTNet: Generative Transfer Network for Zero-Shot Object Detection
    Zhao, Shizhen
    Gao, Changxin
    Shao, Yuanjie
    Li, Lerenhan
    Yu, Changqian
    Ji, Zhong
    Sang, Nang
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 12967 - 12974
  • [7] Zero-Shot Camouflaged Object Detection
    Li, Haoran
    Feng, Chun-Mei
    Xu, Yong
    Zhou, Tao
    Yao, Lina
    Chang, Xiaojun
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 5126 - 5137
  • [8] Decoupled Metric Network for Single-Stage Few-Shot Object Detection
    Lu, Yue
    Chen, Xingyu
    Wu, Zhengxing
    Yu, Junzhi
    IEEE TRANSACTIONS ON CYBERNETICS, 2023, 53 (01) : 514 - 525
  • [9] Semantics-Guided Contrastive Network for Zero-Shot Object Detection
    Yan, Caixia
    Chang, Xiaojun
    Luo, Minnan
    Liu, Huan
    Zhang, Xiaoqin
    Zheng, Qinghua
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (03) : 1530 - 1544
  • [10] Visual Language Based Succinct Zero-Shot Object Detection
    Zheng, Ye
    Huang, Xi
    Cui, Li
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 5410 - 5418