Single-stage zero-shot object detection network based on CLIP and pseudo-labeling

被引:2
|
作者
Li, Jiafeng [1 ,2 ]
Sun, Shengyao [1 ,2 ]
Zhang, Kang [1 ,2 ]
Zhang, Jing [1 ,2 ]
Zhuo, Li [1 ,2 ]
机构
[1] Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China
[2] Beijing Univ Technol, Beijing Key Lab Computat Intelligence & Intelligen, Beijing 100124, Peoples R China
基金
中国国家自然科学基金; 北京市自然科学基金;
关键词
Zero-shot detection; Single-stage; CLIP; Pseudo-labeling;
D O I
10.1007/s13042-024-02321-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The detection of unknown objects is a challenging task in computer vision because, although there are diverse real-world detection object categories, existing object-detection training sets cover a limited number of object categories . Most existing approaches use two-stage networks to improve a model's ability to characterize objects of unknown classes, which leads to slow inference. To address this issue, we proposed a single-stage unknown object detection method based on the contrastive language-image pre-training (CLIP) model and pseudo-labelling, called CLIP-YOLO. First, a visual language embedding alignment method is introduced and a channel-grouped enhanced coordinate attention module is embedded into a YOLO-series detection head and feature-enhancing component, to improve the model's ability to characterize and detect unknown category objects. Second, the pseudo-labelling generation is optimized based on the CLIP model to expand the diversity of the training set and enhance the ability to cover unknown object categories. We validated this method on four challenging datasets: MSCOCO, ILSVRC, Visual Genome, and PASCAL VOC. The results show that our method can achieve higher accuracy and faster speed, so as to obtain better performance of unknown object detection. The source code is available at https://github.com/BJUTsipl/CLIP-YOLO.
引用
收藏
页码:1055 / 1070
页数:16
相关论文
共 50 条
  • [31] Learning Latent Semantic Attributes for Zero-Shot Object Detection
    Wang, Kang
    Zhang, Lu
    Tan, Yifan
    Zhao, Jiajia
    Zhou, Shuigeng
    2020 IEEE 32ND INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), 2020, : 230 - 237
  • [32] Adaptive adjustment with semantic embedding for zero-shot object detection
    Lv, Wen
    Shi, Hongbo
    Tan, Shuai
    Song, Bing
    Tao, Yang
    JOURNAL OF ELECTRONIC IMAGING, 2023, 32 (03)
  • [33] AdaCLIP: Adapting CLIP with Hybrid Learnable Prompts for Zero-Shot Anomaly Detection
    Cao, Yunkang
    Zhang, Jiangning
    Frittoli, Luca
    Cheng, Yuqi
    Shen, Weiming
    Boracchi, Giacomo
    COMPUTER VISION-ECCV 2024, PT XXXV, 2025, 15093 : 55 - 72
  • [34] Zero-Shot Learning for Raw Network Traffic Detection
    Rani, Pooja
    Bastian, Nathaniel D.
    ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FOR MULTI-DOMAIN OPERATIONS APPLICATIONS VI, 2024, 13051
  • [35] Zero-Shot Accent Conversion using Pseudo Siamese Disentanglement Network
    Jia, Dongya
    Tian, Qiao
    Peng, Kainan
    Li, Jiaxin
    Chen, Yuanzhe
    Ma, Mingbo
    Wang, Yuping
    Wang, Yuxuan
    INTERSPEECH 2023, 2023, : 5476 - 5480
  • [36] An attention-based feature pyramid network for single-stage small object detection
    Lin Jiao
    Chenrui Kang
    Shifeng Dong
    Peng Chen
    Gaoqiang Li
    Rujing Wang
    Multimedia Tools and Applications, 2023, 82 : 18529 - 18544
  • [37] An attention-based feature pyramid network for single-stage small object detection
    Jiao, Lin
    Kang, Chenrui
    Dong, Shifeng
    Chen, Peng
    Li, Gaoqiang
    Wang, Rujing
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (12) : 18529 - 18544
  • [38] A novel feature-based model for zero-shot object detection with simulated attributes
    Cheng Yang
    Weijia Wu
    Yuxing Wang
    Hong Zhou
    Applied Intelligence, 2022, 52 : 6905 - 6914
  • [39] A novel feature-based model for zero-shot object detection with simulated attributes
    Yang, Cheng
    Wu, Weijia
    Wang, Yuxing
    Zhou, Hong
    APPLIED INTELLIGENCE, 2021, 52 (6) : 6905 - 6914
  • [40] Contour detection network for zero-shot sketch-based image retrieval
    Zhang, Qing
    Zhang, Jing
    Su, Xiangdong
    Bao, Feilong
    Gao, Guanglai
    COMPLEX & INTELLIGENT SYSTEMS, 2023, 9 (06) : 6781 - 6795