Pseudo-Prompt Generating in Pre-trained Vision-Language Models for Multi-label Medical Image Classification

被引:0
|
作者
Ye, Yaoqin [1 ]
Zhang, Junjie [1 ]
Shi, Hongwei [1 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Chengdu, Peoples R China
关键词
Prompt Learning; Medical Image Recognition; Multi-label Classification; Vision-Language Models;
D O I
10.1007/978-981-97-8496-7_20
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The task of medical image recognition is notably complicated by the presence of varied and multiple pathological indications, presenting a unique challenge in multi-label classification with unseen labels. This complexity underlines the need for computer-aided diagnosis methods employing multi-label zero-shot learning. Recent advancements in pre-trained vision-language models (VLMs) have showcased notable zero-shot classification abilities on medical images. However, these methods have limitations on leveraging extensive pre-trained knowledge from broader datasets, and often depend on manual prompt construction by expert radiologists. By automating the process of prompt tuning, prompt learning techniques have emerged as an efficient way to adapt VLMs to downstream tasks. Yet, existing CoOp-based strategies fall short in performing class-specific prompts on unseen categories, limiting generalizability in fine-grained scenarios. To overcome these constraints, we introduce a novel prompt generation approach inspirited by text generation in natural language processing (NLP). Our method, named Pseudo-Prompt Generating (PsPG), capitalizes on the priori knowledge of multi-modal features. Featuring a RNN-based decoder, PsPG autoregressively generates class-tailored embedding vectors, i.e., pseudo-prompts. Comparative evaluations on various multi-label chest radiograph datasets affirm the superiority of our approach against leading medical vision-language and multi-label prompt learning methods. The source code is available at https://github.com/fallingnight/PsPG.
引用
收藏
页码:279 / 298
页数:20
相关论文
共 50 条
  • [21] Comparison of pre-trained language models in terms of carbon emissions, time and accuracy in multi-label text classification using AutoML
    Savci, Pinar
    Das, Bihter
    HELIYON, 2023, 9 (05)
  • [22] Efficiently Gluing Pre-Trained Language and Vision Models for Image Captioning
    Song, Peipei
    Zhou, Yuanen
    Liu, Daqing
    Yang, Xun
    Wang, Depeng
    Hu, Zhenzhen
    Wang, Meng
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2024, 15 (06)
  • [23] DeBERTa-BiLSTM: A multi-label classification model of Arabic medical questions using pre-trained models and deep learning
    Al-Smadi, Bushra Salem
    COMPUTERS IN BIOLOGY AND MEDICINE, 2024, 170
  • [24] ROME: Evaluating Pre-trained Vision-Language Models on Reasoning beyond Visual Common Sense
    Zhou, Kankan
    Lai, Eason
    Yeong, Wei Bin Au
    Mouratidis, Kyriakos
    Jiang, Jing
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 10185 - 10197
  • [25] EmoBART: Multi-label Emotion Classification Method Based on Pre-trained Label Sequence Generation Model
    Chen, Sufen
    Chen, Lei
    Zeng, Xuegiang
    NEURAL COMPUTING FOR ADVANCED APPLICATIONS, NCAA 2024, PT III, 2025, 2183 : 104 - 115
  • [26] CLIP-Llama: A New Approach for Scene Text Recognition with a Pre-Trained Vision-Language Model and a Pre-Trained Language Model
    Zhao, Xiaoqing
    Xu, Miaomiao
    Silamu, Wushour
    Li, Yanbing
    SENSORS, 2024, 24 (22)
  • [27] Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models
    Wu, Wenhao
    Wang, Xiaohan
    Luo, Haipeng
    Wang, Jingdong
    Yang, Yi
    Ouyang, Wanli
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 6620 - 6630
  • [28] Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models
    Tang, Longxiang
    Tian, Zhuotao
    Li, Kai
    He, Chunming
    Zhou, Hantao
    Zhao, Hengshuang
    Li, Xiu
    Jia, Jiaya
    COMPUTER VISION - ECCV 2024, PT XXXVI, 2025, 15094 : 346 - 365
  • [29] Research on cross-lingual multi-label patent classification based on pre-trained model
    Lu, Yonghe
    Chen, Lehua
    Tong, Xinyu
    Peng, Yongxin
    Zhu, Hou
    SCIENTOMETRICS, 2024, 129 (06) : 3067 - 3087
  • [30] Robotic environmental state recognition with pre-trained vision-language models and black-box optimization
    Kawaharazuka, Kento
    Obinata, Yoshiki
    Kanazawa, Naoaki
    Okada, Kei
    Inaba, Masayuki
    ADVANCED ROBOTICS, 2024, 38 (18) : 1255 - 1264