Vision-BioLLM: Large vision language model for visual dialogue in biomedical imagery

被引:0
|
作者
Alshibli, Ahmad [1 ]
Bazi, Yakoub [2 ]
Rahhal, Mohamad Mahmoud Al [3 ]
Zuair, Mansour [2 ]
机构
[1] King Saud Univ, Coll Comp & Informat Sci, Comp Sci Dept, Riyadh 11543, Saudi Arabia
[2] King Saud Univ, Coll Comp & Informat Sci, Comp Engn Dept, Riyadh 11543, Saudi Arabia
[3] King Saud Univ, Coll Appl Comp Sci, Appl Comp Sci Dept, Riyadh 11543, Saudi Arabia
关键词
Large vision language model; Biomedical images; Transformers; Visual question answering; Captioning;
D O I
10.1016/j.bspc.2024.107437
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
In this paper, we present a vision-language model tailored for visual dialogue in the biomedical domain, utilizing a LanguageBind transformer as the vision encoder and Llama3-OpenBioLLM as the language decoder. Our training approach involves three stages: alignment, instruction-tuning, and task-specific fine-tuning. The alignment phase synchronizes outputs from the vision encoder with inputs to the decoder using a multi-layer perceptron (MLP). In the instruction-tuning phase, we enhance language comprehension through low-rank adaptation (LoRA) with a mixed dataset of general and biomedical images. We also improve three biomedical datasets by transforming visual question datasets into dialogue contexts and adding concise summaries of dialogues. Experimental results demonstrate the model's effectiveness against state-of-the-art methods, showcasing its potential to enhance biomedical visual dialogue. Code and models are available at: http://github.com/Big Data-KSU/Vision-BioLLM-KSU.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Vision-Language Model for Visual Question Answering in Medical Imagery
    Bazi, Yakoub
    Al Rahhal, Mohamad Mahmoud
    Bashmal, Laila
    Zuair, Mansour
    BIOENGINEERING-BASEL, 2023, 10 (03):
  • [2] CoLLaVO: Crayon Large Language and Vision mOdel
    Lee, Byung-Kwan
    Park, Beomchan
    Kim, Chae Won
    Ro, Yong Man
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 1121 - 1138
  • [3] Vary: Scaling up the Vision Vocabulary for Large Vision-Language Model
    Wei, Haoran
    Kong, Lingyu
    Chen, Jinyue
    Zhao, Liang
    Ge, Zheng
    Yang, Jinrong
    Sun, Jianjian
    Han, Chunrui
    Zhang, Xiangyu
    COMPUTER VISION-ECCV 2024, PT IV, 2025, 15062 : 408 - 424
  • [4] VISUAL-IMAGERY AS THE SIMULATION OF VISION
    CURRIE, G
    MIND & LANGUAGE, 1995, 10 (1-2) : 25 - 44
  • [5] MiniMedGPT: Efficient Large Vision-Language Model for medical Visual Question Answering
    Alsabbagh, Abdel Rahman
    Mansour, Tariq
    Al-Kharabsheh, Mohammad
    Ebdah, Abdel Salam
    Al-Emaryeen, Roa'a
    Al-Nahhas, Sara
    Mahafza, Waleed
    Al-Kadi, Omar
    PATTERN RECOGNITION LETTERS, 2025, 189 : 8 - 16
  • [6] A generalist vision-language foundation model for diverse biomedical tasks
    Zhang, Kai
    Zhou, Rong
    Adhikarla, Eashan
    Yan, Zhiling
    Liu, Yixin
    Yu, Jun
    Liu, Zhengliang
    Chen, Xun
    Davison, Brian D.
    Ren, Hui
    Huang, Jing
    Chen, Chen
    Zhou, Yuyin
    Fu, Sunyang
    Liu, Wei
    Liu, Tianming
    Li, Xiang
    Chen, Yong
    He, Lifang
    Zou, James
    Li, Quanzheng
    Liu, Hongfang
    Sun, Lichao
    NATURE MEDICINE, 2024, 30 (11) : 3129 - 3141
  • [7] HOW VISUAL-IMAGERY INTERFERES WITH VISION
    CRAVERLEMLEY, C
    REEVES, A
    PSYCHOLOGICAL REVIEW, 1992, 99 (04) : 633 - 649
  • [8] Neglect in vision and visual imagery: A double dissociation
    Coslett, HB
    BRAIN, 1997, 120 : 1163 - 1171
  • [9] Visual In-Context Learning for Large Vision-Language Models
    Zhou, Yucheng
    Le, Xiang
    Wang, Qianning
    Shen, Jianbing
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 15890 - 15902
  • [10] Vision-Language Models for Biomedical Applications
    Thapa, Surendrabikram
    Naseem, Usman
    Zhou, Luping
    Kim, Jinman
    PROCEEDINGS OF THE FIRST INTERNATIONAL WORKSHOP ON VISION-LANGUAGE MODELS FOR BIOMEDICAL APPLICATIONS, VLM4BIO 2024, 2024, : 1 - 2