CLIP-Count: Towards Text-Guided Zero-Shot Object Counting

被引:15
|
作者
Jiang, Ruixiang [1 ]
Liu, Lingbo [1 ]
Chen, Changwen [1 ]
机构
[1] Hong Kong Polytech Univ, HKSAR, Hong Kong, Peoples R China
关键词
class-agnostic object counting; clip; zero-shot; text-guided;
D O I
10.1145/3581783.3611789
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent advances in visual-language models have shown remarkable zero-shot text-image matching ability that is transferable to downstream tasks such as object detection and segmentation. Adapting these models for object counting, however, remains a formidable challenge. In this study, we first investigate transferring vision-language models (VLMs) for class-agnostic object counting. Specifically, we propose CLIP-Count, the first end-to-end pipeline that estimates density maps for open-vocabulary objects with text guidance in a zero-shot manner. To align the text embedding with dense visual features, we introduce a patch-text contrastive loss that guides the model to learn informative patch-level visual representations for dense prediction. Moreover, we design a hierarchical patch-text interaction module to propagate semantic information across different resolution levels of visual features. Benefiting from the full exploitation of the rich image-text alignment knowledge of pretrained VLMs, our method effectively generates high-quality density maps for objects-of-interest. Extensive experiments on FSC-147, CARPK, and ShanghaiTech crowd counting datasets demonstrate state-of-the-art accuracy and generalizability of the proposed method. Code is available: https://github.com/songrise/CLIP-Count.
引用
收藏
页码:4535 / 4545
页数:11
相关论文
共 50 条
  • [41] Label Augmentation for Zero-Shot Hierarchical Text Classification
    Paletto, Lorenzo
    Basile, Valerio
    Esposito, Roberto
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 7697 - 7706
  • [42] Unified benchmark for zero-shot Turkish text classification
    celik, Emrecan
    Dalyan, Tugba
    INFORMATION PROCESSING & MANAGEMENT, 2023, 60 (03)
  • [43] ZeroSCROLLS: A Zero-Shot Benchmark for Long Text Understanding
    Shahamt, Uri
    Ivgit, Maor
    Efratt, Avia
    Berantt, Jonathan
    Levytmu, Omer
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 7977 - 7989
  • [44] Extreme Zero-Shot Learning for Extreme Text Classification
    Xiong, Yuanhao
    Chang, Wei-Cheng
    Hsieh, Cho-Jui
    Yu, Hsiang-Fu
    Dhillon, Inderjit
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 5455 - 5468
  • [45] Learn to Adapt for Generalized Zero-Shot Text Classification
    Zhang, Yiwen
    Yuan, Caixia
    Wang, Xiaojie
    Bai, Ziwei
    Liu, Yongbin
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 517 - 527
  • [46] CLIPTEXT: A New Paradigm for Zero-shot Text Classification
    Qin, Libo
    Wang, Weiyun
    Chen, Qiguang
    Che, Wanxiang
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 1077 - 1088
  • [47] Generalized Zero-Shot Text Classification for ICD Coding
    Song, Congzheng
    Zhang, Shanghang
    Sadoughi, Najmeh
    Xie, Pengtao
    Xing, Eric
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 4018 - 4024
  • [48] PRESENT: Zero-Shot Text-to-Prosody Control
    Lam, Perry
    Zhang, Huayun
    Chen, Nancy F.
    Sisman, Berrak
    Herremans, Dorien
    IEEE SIGNAL PROCESSING LETTERS, 2025, 32 : 776 - 780
  • [49] Zero-Shot Object Recognition by Semantic Manifold Distance
    Fu, Zhenyong
    Xiang, Tao
    Kodirov, Elyor
    Gong, Shaogang
    2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2015, : 2635 - 2644
  • [50] Synthetic Feature Assessment for Zero-Shot Object Detection
    Dai, Xinmiao
    Wang, Chong
    Li, Haohe
    Lin, Sunqi
    Dong, Li
    Wu, Jiafei
    Wang, Jun
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 444 - 449