CLIP-Count: Towards Text-Guided Zero-Shot Object Counting

被引:15
|
作者
Jiang, Ruixiang [1 ]
Liu, Lingbo [1 ]
Chen, Changwen [1 ]
机构
[1] Hong Kong Polytech Univ, HKSAR, Hong Kong, Peoples R China
关键词
class-agnostic object counting; clip; zero-shot; text-guided;
D O I
10.1145/3581783.3611789
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent advances in visual-language models have shown remarkable zero-shot text-image matching ability that is transferable to downstream tasks such as object detection and segmentation. Adapting these models for object counting, however, remains a formidable challenge. In this study, we first investigate transferring vision-language models (VLMs) for class-agnostic object counting. Specifically, we propose CLIP-Count, the first end-to-end pipeline that estimates density maps for open-vocabulary objects with text guidance in a zero-shot manner. To align the text embedding with dense visual features, we introduce a patch-text contrastive loss that guides the model to learn informative patch-level visual representations for dense prediction. Moreover, we design a hierarchical patch-text interaction module to propagate semantic information across different resolution levels of visual features. Benefiting from the full exploitation of the rich image-text alignment knowledge of pretrained VLMs, our method effectively generates high-quality density maps for objects-of-interest. Extensive experiments on FSC-147, CARPK, and ShanghaiTech crowd counting datasets demonstrate state-of-the-art accuracy and generalizability of the proposed method. Code is available: https://github.com/songrise/CLIP-Count.
引用
收藏
页码:4535 / 4545
页数:11
相关论文
共 50 条
  • [21] A Survey of Zero-Shot Object Detection
    Cao, Weipeng
    Yao, Xuyang
    Xu, Zhiwu
    Liu, Ye
    Pan, Yinghui
    Ming, Zhong
    BIG DATA MINING AND ANALYTICS, 2025, 8 (03): : 726 - 750
  • [22] Zero-Shot Camouflaged Object Detection
    Li, Haoran
    Feng, Chun-Mei
    Xu, Yong
    Zhou, Tao
    Yao, Lina
    Chang, Xiaojun
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 5126 - 5137
  • [23] Improving Zero-Shot Generalization for CLIP with Variational Adapter
    Lu, Ziqian
    Shen, Fengli
    Liu, Mushui
    Yu, Yunlong
    Li, Xi
    COMPUTER VISION - ECCV 2024, PT XX, 2025, 15078 : 328 - 344
  • [24] SAVE: Self-Attention on Visual Embedding for Zero-Shot Generic Object Counting
    Zgaren, Ahmed
    Bouachir, Wassim
    Bouguila, Nizar
    JOURNAL OF IMAGING, 2025, 11 (02)
  • [25] Towards Zero-shot Language Modeling
    Ponti, Edoardo M.
    Vulic, Ivan
    Cotterell, Ryan
    Reichart, Roi
    Korhonen, Anna
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 2900 - +
  • [26] Towards Open Zero-Shot Learning
    Marmoreo, Federico
    Carrazco, Julio Ivan Davila
    Cavazza, Jacopo
    Murino, Vittorio
    IMAGE ANALYSIS AND PROCESSING, ICIAP 2022, PT II, 2022, 13232 : 564 - 575
  • [27] Zero-Shot Text-to-Image Generation
    Ramesh, Aditya
    Pavlov, Mikhail
    Goh, Gabriel
    Gray, Scott
    Voss, Chelsea
    Radford, Alec
    Chen, Mark
    Sutskever, Ilya
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [28] Retrieval Augmented Zero-Shot Text Classification
    Abdullahi, Tassallah
    Singh, Ritambhara
    Eickhoff, Carsten
    PROCEEDINGS OF THE 2024 ACM SIGIR INTERNATIONAL CONFERENCE ON THE THEORY OF INFORMATION RETRIEVAL, ICTIR 2024, 2024, : 195 - 203
  • [29] Zero-Shot Object Detection for Indoor Robots
    Abdalwhab, Abdalwhab
    Liu, Huaping
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [30] Zero-Shot Object Detection with Textual Descriptions
    Li, Zhihui
    Yao, Lina
    Zhang, Xiaoqin
    Wang, Xianzhi
    Kanhere, Salil
    Zhang, Huaxiang
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8690 - 8697