Single-stage zero-shot object detection network based on CLIP and pseudo-labeling

被引:2
|
作者
Li, Jiafeng [1 ,2 ]
Sun, Shengyao [1 ,2 ]
Zhang, Kang [1 ,2 ]
Zhang, Jing [1 ,2 ]
Zhuo, Li [1 ,2 ]
机构
[1] Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China
[2] Beijing Univ Technol, Beijing Key Lab Computat Intelligence & Intelligen, Beijing 100124, Peoples R China
基金
中国国家自然科学基金; 北京市自然科学基金;
关键词
Zero-shot detection; Single-stage; CLIP; Pseudo-labeling;
D O I
10.1007/s13042-024-02321-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The detection of unknown objects is a challenging task in computer vision because, although there are diverse real-world detection object categories, existing object-detection training sets cover a limited number of object categories . Most existing approaches use two-stage networks to improve a model's ability to characterize objects of unknown classes, which leads to slow inference. To address this issue, we proposed a single-stage unknown object detection method based on the contrastive language-image pre-training (CLIP) model and pseudo-labelling, called CLIP-YOLO. First, a visual language embedding alignment method is introduced and a channel-grouped enhanced coordinate attention module is embedded into a YOLO-series detection head and feature-enhancing component, to improve the model's ability to characterize and detect unknown category objects. Second, the pseudo-labelling generation is optimized based on the CLIP model to expand the diversity of the training set and enhance the ability to cover unknown object categories. We validated this method on four challenging datasets: MSCOCO, ILSVRC, Visual Genome, and PASCAL VOC. The results show that our method can achieve higher accuracy and faster speed, so as to obtain better performance of unknown object detection. The source code is available at https://github.com/BJUTsipl/CLIP-YOLO.
引用
收藏
页码:1055 / 1070
页数:16
相关论文
共 50 条
  • [41] Part-Object Progressive Refinement Network for Zero-Shot Learning
    Liu, Man
    Zhang, Chunjie
    Bai, Huihui
    Zhao, Yao
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 2032 - 2043
  • [42] Semantic Policy Network for Zero-Shot Object Goal Visual Navigation
    Zhao, Qianfan
    Zhang, Lu
    He, Bin
    Liu, Zhiyong
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (11) : 7655 - 7662
  • [43] Contour detection network for zero-shot sketch-based image retrieval
    Qing Zhang
    Jing Zhang
    Xiangdong Su
    Feilong Bao
    Guanglai Gao
    Complex & Intelligent Systems, 2023, 9 : 6781 - 6795
  • [44] Zero-Shot Object Recognition System Based on Topic Model
    Hoo, Wai Lam
    Chan, Chee Seng
    IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, 2015, 45 (04) : 518 - 525
  • [45] Improved Visual-Semantic Alignment for Zero-Shot Object Detection
    Rahman, Shafin
    Khan, Salman
    Barnes, Nick
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 11932 - 11939
  • [46] Zero-Shot Object Detection: Joint Recognition and Localization of Novel Concepts
    Shafin Rahman
    Salman H. Khan
    Fatih Porikli
    International Journal of Computer Vision, 2020, 128 : 2979 - 2999
  • [47] Zero-Shot Object Detection and Segmentation: A Focus on Street View Imagery
    Tilki, Sahra
    Kaplan, Ahmet
    Zengin, Aydin Tarik
    2024 IEEE 3RD INTERNATIONAL CONFERENCE ON COMPUTING AND MACHINE INTELLIGENCE, ICMI 2024, 2024,
  • [48] Semantics-Preserving Graph Propagation for Zero-Shot Object Detection
    Yan, Caixia
    Zheng, Qinghua
    Chang, Xiaojun
    Luo, Minnan
    Yeh, Chung-Hsing
    Hauptman, Alexander G.
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 8163 - 8176
  • [49] Zero-Shot Object Detection: Joint Recognition and Localization of Novel Concepts
    Rahman, Shafin
    Khan, Salman H.
    Porikli, Fatih
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2020, 128 (12) : 2979 - 2999
  • [50] SRODET: Semi-Supervised Remote Sensing Object Detection With Dynamic Pseudo-Labeling
    Wang, Wenyong
    Cai, Yuanzheng
    Wang, Tao
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2025, 22