Pseudo-Prompt Generating in Pre-trained Vision-Language Models for Multi-label Medical Image Classification

被引:0
|
作者
Ye, Yaoqin [1 ]
Zhang, Junjie [1 ]
Shi, Hongwei [1 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Chengdu, Peoples R China
关键词
Prompt Learning; Medical Image Recognition; Multi-label Classification; Vision-Language Models;
D O I
10.1007/978-981-97-8496-7_20
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The task of medical image recognition is notably complicated by the presence of varied and multiple pathological indications, presenting a unique challenge in multi-label classification with unseen labels. This complexity underlines the need for computer-aided diagnosis methods employing multi-label zero-shot learning. Recent advancements in pre-trained vision-language models (VLMs) have showcased notable zero-shot classification abilities on medical images. However, these methods have limitations on leveraging extensive pre-trained knowledge from broader datasets, and often depend on manual prompt construction by expert radiologists. By automating the process of prompt tuning, prompt learning techniques have emerged as an efficient way to adapt VLMs to downstream tasks. Yet, existing CoOp-based strategies fall short in performing class-specific prompts on unseen categories, limiting generalizability in fine-grained scenarios. To overcome these constraints, we introduce a novel prompt generation approach inspirited by text generation in natural language processing (NLP). Our method, named Pseudo-Prompt Generating (PsPG), capitalizes on the priori knowledge of multi-modal features. Featuring a RNN-based decoder, PsPG autoregressively generates class-tailored embedding vectors, i.e., pseudo-prompts. Comparative evaluations on various multi-label chest radiograph datasets affirm the superiority of our approach against leading medical vision-language and multi-label prompt learning methods. The source code is available at https://github.com/fallingnight/PsPG.
引用
收藏
页码:279 / 298
页数:20
相关论文
共 50 条
  • [1] CPT: Colorful Prompt Tuning for pre-trained vision-language models
    Yao, Yuan
    Zhang, Ao
    Zhang, Zhengyan
    Liu, Zhiyuan
    Chua, Tat-Seng
    Sun, Maosong
    AI OPEN, 2024, 5 : 30 - 38
  • [2] Constraint embedding for prompt tuning in vision-language pre-trained model
    Cheng, Keyang
    Wei, Liutao
    Tang, Jingfeng
    Zhan, Yongzhao
    MULTIMEDIA SYSTEMS, 2025, 31 (01)
  • [3] Dual Modality Prompt Tuning for Vision-Language Pre-Trained Model
    Xing, Yinghui
    Wu, Qirui
    Cheng, De
    Zhang, Shizhou
    Liang, Guoqiang
    Wang, Peng
    Zhang, Yanning
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 2056 - 2068
  • [4] Constraint embedding for prompt tuning in vision-language pre-trained modelConstraint embedding for prompt tuning in vision-language pre-trained modelK. Cheng et al.
    Keyang Cheng
    Liutao Wei
    Jingfeng Tang
    Yongzhao Zhan
    Multimedia Systems, 2025, 31 (1)
  • [5] Universal Adversarial Perturbations for Vision-Language Pre-trained Models
    Zhang, Peng-Fei
    Huang, Zi
    Bai, Guangdong
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 862 - 871
  • [6] MuDPT: Multi-modal Deep-symphysis Prompt Tuning for Large Pre-trained Vision-Language Models
    Miao, Yongzhu
    Li, Shasha
    Tang, Jintao
    Wang, Ting
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 25 - 30
  • [7] Harnessing the Power of Pre-trained Vision-Language Models for Efficient Medical Report Generation
    Li, Qi
    PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 1308 - 1317
  • [8] Multimodal Search on Iconclass using Vision-Language Pre-Trained Models
    Santini, Cristian
    Posthumus, Etienne
    Tietz, Tabea
    Tan, Mary Ann
    Bruns, Oleksandra
    Sack, Harald
    2023 ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES, JCDL, 2023, : 285 - 287
  • [9] Multi-Label Conditional Generation From Pre-Trained Models
    Proszewska, Magdalena
    Wolczyk, Maciej
    Zieba, Maciej
    Wielopolski, Patryk
    Maziarka, Lukasz
    Smieja, Marek
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (09) : 6185 - 6198
  • [10] Open-Vocabulary Multi-label Image Classification with Pretrained Vision-Language Model
    Dao, Son D.
    Huynh, Dat
    Zhao, He
    Phung, Dinh
    Cai, Jianfei
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 2135 - 2140