Pseudo-Prompt Generating in Pre-trained Vision-Language Models for Multi-label Medical Image Classification

被引:0
|
作者
Ye, Yaoqin [1 ]
Zhang, Junjie [1 ]
Shi, Hongwei [1 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Chengdu, Peoples R China
关键词
Prompt Learning; Medical Image Recognition; Multi-label Classification; Vision-Language Models;
D O I
10.1007/978-981-97-8496-7_20
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The task of medical image recognition is notably complicated by the presence of varied and multiple pathological indications, presenting a unique challenge in multi-label classification with unseen labels. This complexity underlines the need for computer-aided diagnosis methods employing multi-label zero-shot learning. Recent advancements in pre-trained vision-language models (VLMs) have showcased notable zero-shot classification abilities on medical images. However, these methods have limitations on leveraging extensive pre-trained knowledge from broader datasets, and often depend on manual prompt construction by expert radiologists. By automating the process of prompt tuning, prompt learning techniques have emerged as an efficient way to adapt VLMs to downstream tasks. Yet, existing CoOp-based strategies fall short in performing class-specific prompts on unseen categories, limiting generalizability in fine-grained scenarios. To overcome these constraints, we introduce a novel prompt generation approach inspirited by text generation in natural language processing (NLP). Our method, named Pseudo-Prompt Generating (PsPG), capitalizes on the priori knowledge of multi-modal features. Featuring a RNN-based decoder, PsPG autoregressively generates class-tailored embedding vectors, i.e., pseudo-prompts. Comparative evaluations on various multi-label chest radiograph datasets affirm the superiority of our approach against leading medical vision-language and multi-label prompt learning methods. The source code is available at https://github.com/fallingnight/PsPG.
引用
收藏
页码:279 / 298
页数:20
相关论文
共 50 条
  • [41] Ensembling Multilingual Pre-Trained Models for Predicting Multi-Label Regression Emotion Share from Speech
    Atmaja, Bagus Tris
    Sasou, Akira
    2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 1026 - 1029
  • [42] Image Retrieval on Real-life Images with Pre-trained Vision-and-Language Models
    Liu, Zheyuan
    Rodriguez-Opazo, Cristian
    Teney, Damien
    Gould, Stephen
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 2105 - 2114
  • [43] Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model
    Xu, Zipeng
    Lin, Tianwei
    Tang, Hao
    Li, Fu
    He, Dongliang
    Sebe, Nicu
    Timofte, Radu
    Van Gool, Luc
    Ding, Errui
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 18208 - 18217
  • [44] MAPL : Parameter-Efficient Adaptation of Unimodal Pre-Trained Models for Vision-Language Few-Shot Prompting
    Manas, Oscar
    Rodriguez, Pau
    Ahmadi, Saba
    Nematzadeh, Aida
    Goyal, Yash
    Agrawal, Aishwarya
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 2523 - 2548
  • [45] OPEN-VOCABULARY SKELETON ACTION RECOGNITION WITH DIFFUSION GRAPH CONVOLUTIONAL NETWORK AND PRE-TRAINED VISION-LANGUAGE MODELS
    Wei, Chao
    Deng, Zhidong
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 3195 - 3199
  • [46] Label Propagation for Zero-shot Classification with Vision-Language Models
    Stojnic, Vladan
    Kalantidis, Yannis
    Tolias, Giorgos
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 23209 - 23218
  • [47] Towards efficient diagnostics: refining vision transformers for medical image multi-label classification
    Cayce, Garrett I.
    Hand, Benjamin M.
    Kurz, Aidan G.
    Bailey, Colleen P.
    ANOMALY DETECTION AND IMAGING WITH X-RAYS, ADIX IX, 2024, 13043
  • [48] CLIPose: Category-Level Object Pose Estimation With Pre-Trained Vision-Language Knowledge
    Lin, Xiao
    Zhu, Minghao
    Dang, Ronghao
    Zhou, Guangliang
    Shu, Shaolong
    Lin, Feng
    Liu, Chengju
    Chen, Qijun
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (10) : 9125 - 9138
  • [49] A Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-Language Model
    Xu, Mengde
    Zhang, Zheng
    Wei, Fangyun
    Lin, Yutong
    Cao, Yue
    Hu, Han
    Bai, Xiang
    COMPUTER VISION, ECCV 2022, PT XXIX, 2022, 13689 : 736 - 753
  • [50] PromptSmooth: Certifying Robustness of Medical Vision-Language Models via Prompt Learning
    Hussein, Noor
    Shamshad, Fahad
    Naseer, Muzammal
    Nandakumar, Karthik
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT XII, 2024, 15012 : 698 - 708