Vision-BioLLM: Large vision language model for visual dialogue in biomedical imagery

被引:0
|
作者
Alshibli, Ahmad [1 ]
Bazi, Yakoub [2 ]
Rahhal, Mohamad Mahmoud Al [3 ]
Zuair, Mansour [2 ]
机构
[1] King Saud Univ, Coll Comp & Informat Sci, Comp Sci Dept, Riyadh 11543, Saudi Arabia
[2] King Saud Univ, Coll Comp & Informat Sci, Comp Engn Dept, Riyadh 11543, Saudi Arabia
[3] King Saud Univ, Coll Appl Comp Sci, Appl Comp Sci Dept, Riyadh 11543, Saudi Arabia
关键词
Large vision language model; Biomedical images; Transformers; Visual question answering; Captioning;
D O I
10.1016/j.bspc.2024.107437
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
In this paper, we present a vision-language model tailored for visual dialogue in the biomedical domain, utilizing a LanguageBind transformer as the vision encoder and Llama3-OpenBioLLM as the language decoder. Our training approach involves three stages: alignment, instruction-tuning, and task-specific fine-tuning. The alignment phase synchronizes outputs from the vision encoder with inputs to the decoder using a multi-layer perceptron (MLP). In the instruction-tuning phase, we enhance language comprehension through low-rank adaptation (LoRA) with a mixed dataset of general and biomedical images. We also improve three biomedical datasets by transforming visual question datasets into dialogue contexts and adding concise summaries of dialogues. Experimental results demonstrate the model's effectiveness against state-of-the-art methods, showcasing its potential to enhance biomedical visual dialogue. Code and models are available at: http://github.com/Big Data-KSU/Vision-BioLLM-KSU.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Vision of the future: large language models in ophthalmology
    Tailor, Prashant D.
    D'Souza, Haley S.
    Li, Hanzhou
    Starr, Matthew R.
    CURRENT OPINION IN OPHTHALMOLOGY, 2024, 35 (05) : 391 - 402
  • [22] Visual attention model for computer vision
    Robert-Inacio, F.
    Yushchenko, L.
    BIOLOGICALLY INSPIRED COGNITIVE ARCHITECTURES, 2014, 7 : 26 - 38
  • [23] NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language Tasks
    Sammani, Fawaz
    Mukherjee, Tanmoy
    Deligiannis, Nikos
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 8312 - 8322
  • [24] Unified Visual Relationship Detection with Vision and Language Models
    Zhao, Long
    Yuan, Liangzhe
    Gong, Boqing
    Cui, Yin
    Schroff, Florian
    Yang, Ming-Hsuan
    Adam, Hartwig
    Liu, Ting
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 6939 - 6950
  • [25] Vision and language: from visual perception to content creation
    Mei, Tao
    Zhang, Wei
    Yao, Ting
    APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2020, 9
  • [26] TALON: Improving Large Language Model Cognition with Tactility-Vision Fusion
    Jiang, Xinyi
    Wang, Guoming
    Li, Huanhuan
    Xia, Qinghua
    Lu, Rongxing
    Tang, Siliang
    2024 IEEE 19TH CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS, ICIEA 2024, 2024,
  • [27] Distilling Large Vision-Language Model with Out-of-Distribution Generalizability
    Li, Xuanlin
    Fang, Yunhao
    Liu, Minghua
    Ling, Zhan
    Tu, Zhuowen
    Su, Hao
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 2492 - 2503
  • [28] Hierarchical Vision and Language Transformer for Efficient Visual Dialog
    He, Qiangqiang
    Zhang, Mujie
    Zhang, Jie
    Yang, Shang
    Wang, Chongjun
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT VI, 2023, 14259 : 421 - 432
  • [29] IVTP: Instruction-Guided Visual Token Pruning for Large Vision-Language Models
    Huang, Kai
    Zou, Hao
    Xi, Ye
    Wang, BoChen
    Xie, Zhen
    Yu, Liang
    COMPUTER VISION - ECCV 2024, PT XVII, 2025, 15075 : 214 - 230
  • [30] Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding
    Leng, Sicong
    Zhang, Hang
    Chen, Guanzheng
    Li, Xin
    Lug, Shijian
    Miao, Chunyan
    Bing, Lidong
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 13872 - 13882