Single-stage zero-shot object detection network based on CLIP and pseudo-labeling

被引:2
|
作者
Li, Jiafeng [1 ,2 ]
Sun, Shengyao [1 ,2 ]
Zhang, Kang [1 ,2 ]
Zhang, Jing [1 ,2 ]
Zhuo, Li [1 ,2 ]
机构
[1] Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China
[2] Beijing Univ Technol, Beijing Key Lab Computat Intelligence & Intelligen, Beijing 100124, Peoples R China
基金
中国国家自然科学基金; 北京市自然科学基金;
关键词
Zero-shot detection; Single-stage; CLIP; Pseudo-labeling;
D O I
10.1007/s13042-024-02321-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The detection of unknown objects is a challenging task in computer vision because, although there are diverse real-world detection object categories, existing object-detection training sets cover a limited number of object categories . Most existing approaches use two-stage networks to improve a model's ability to characterize objects of unknown classes, which leads to slow inference. To address this issue, we proposed a single-stage unknown object detection method based on the contrastive language-image pre-training (CLIP) model and pseudo-labelling, called CLIP-YOLO. First, a visual language embedding alignment method is introduced and a channel-grouped enhanced coordinate attention module is embedded into a YOLO-series detection head and feature-enhancing component, to improve the model's ability to characterize and detect unknown category objects. Second, the pseudo-labelling generation is optimized based on the CLIP model to expand the diversity of the training set and enhance the ability to cover unknown object categories. We validated this method on four challenging datasets: MSCOCO, ILSVRC, Visual Genome, and PASCAL VOC. The results show that our method can achieve higher accuracy and faster speed, so as to obtain better performance of unknown object detection. The source code is available at https://github.com/BJUTsipl/CLIP-YOLO.
引用
收藏
页码:1055 / 1070
页数:16
相关论文
共 50 条
  • [21] ZERO-SHOT DETECTION WITH TRANSFERABLE OBJECT PROPOSAL MECHANISM
    Shao, Yilan
    Li, Yanan
    Wang, Donghui
    2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 3666 - 3670
  • [22] APP: Adaptive Prototypical Pseudo-Labeling for Few-shot OOD Detection
    Wang, Pei
    He, Keqing
    Mou, Yutao
    Song, Xiaoshuai
    Wu, Yanan
    Wang, Jingang
    Xian, Yunsen
    Cai, Xunliang
    Xu, Weiran
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 3926 - 3939
  • [23] CLIP4HOI: Towards Adapting CLIP for Practical Zero-Shot HOI Detection
    Mao, Yunyao
    Deng, Jiajun
    Zhou, Wengang
    Li, Li
    Fang, Yao
    Li, Houqiang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [24] CLIP-Count: Towards Text-Guided Zero-Shot Object Counting
    Jiang, Ruixiang
    Liu, Lingbo
    Chen, Changwen
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4535 - 4545
  • [25] Robust Region Feature Synthesizer for Zero-Shot Object Detection
    Huang, Peiliang
    Han, Junwei
    Cheng, De
    Zhang, Dingwen
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 7612 - 7621
  • [26] Zero-Shot Out-of-Distribution Detection Based on the Pre-trained Model CLIP
    Esmaeilpour, Sepideh
    Liu, Bing
    Robertson, Eric
    Shu, Lei
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 6568 - 6576
  • [27] A Multi-Space Approach to Zero-Shot Object Detection
    Gupta, Dikshant
    Anantharaman, Aditya
    Mamgain, Nehal
    Kamath, Sowmya S.
    Balasubramanian, Vineeth N.
    Jawahar, C., V
    2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 1198 - 1206
  • [28] A dynamic semantic knowledge graph for zero-shot object detection
    Lv, Wen
    Shi, Hongbo
    Tan, Shuai
    Song, Bing
    Tao, Yang
    VISUAL COMPUTER, 2023, 39 (10): : 4513 - 4527
  • [29] Zero-Shot Aerial Object Detection with Visual Description Regularization
    Zang, Zhengqing
    Lin, Chenyu
    Tang, Chenwei
    Wang, Tao
    Lv, Jiancheng
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 6926 - 6934
  • [30] A dynamic semantic knowledge graph for zero-shot object detection
    Wen Lv
    Hongbo Shi
    Shuai Tan
    Bing Song
    Yang Tao
    The Visual Computer, 2023, 39 : 4513 - 4527