Vision-BioLLM: Large vision language model for visual dialogue in biomedical imagery

被引:0
|
作者
Alshibli, Ahmad [1 ]
Bazi, Yakoub [2 ]
Rahhal, Mohamad Mahmoud Al [3 ]
Zuair, Mansour [2 ]
机构
[1] King Saud Univ, Coll Comp & Informat Sci, Comp Sci Dept, Riyadh 11543, Saudi Arabia
[2] King Saud Univ, Coll Comp & Informat Sci, Comp Engn Dept, Riyadh 11543, Saudi Arabia
[3] King Saud Univ, Coll Appl Comp Sci, Appl Comp Sci Dept, Riyadh 11543, Saudi Arabia
关键词
Large vision language model; Biomedical images; Transformers; Visual question answering; Captioning;
D O I
10.1016/j.bspc.2024.107437
中图分类号
R318 [生物医学工程];
学科分类号
0831 ;
摘要
In this paper, we present a vision-language model tailored for visual dialogue in the biomedical domain, utilizing a LanguageBind transformer as the vision encoder and Llama3-OpenBioLLM as the language decoder. Our training approach involves three stages: alignment, instruction-tuning, and task-specific fine-tuning. The alignment phase synchronizes outputs from the vision encoder with inputs to the decoder using a multi-layer perceptron (MLP). In the instruction-tuning phase, we enhance language comprehension through low-rank adaptation (LoRA) with a mixed dataset of general and biomedical images. We also improve three biomedical datasets by transforming visual question datasets into dialogue contexts and adding concise summaries of dialogues. Experimental results demonstrate the model's effectiveness against state-of-the-art methods, showcasing its potential to enhance biomedical visual dialogue. Code and models are available at: http://github.com/Big Data-KSU/Vision-BioLLM-KSU.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] A model of human vision to register and fuse video imagery
    Keller, JG
    Matechik, S
    Rogers, SK
    Kabrisky, M
    SIGNAL PROCESSING, SENSOR FUSION, AND TARGET RECOGNITION VIII, 1999, 3720 : 366 - 375
  • [32] Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks
    Li, Hao
    Zhu, Jinguo
    Jiang, Xiaohu
    Zhu, Xizhou
    Li, Hongsheng
    Yuan, Chun
    Wang, Xiaohua
    Qiao, Yu
    Wang, Xiaogang
    Wang, Wenhai
    Dai, Jifeng
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2691 - 2700
  • [33] Hemispheric processing of categorical and coordinate spatial relations in vision and visual imagery
    Michimata, C
    BRAIN AND COGNITION, 1997, 33 (03) : 370 - 387
  • [34] Learning to Exploit Temporal Structure for Biomedical Vision-Language Processing
    Bannur, Shruthi
    Hyland, Stephanie
    Liu, Qianchu
    Perez-Garcia, Fernando
    Ilse, Maximilian
    Castro, Daniel C.
    Boecking, Benedikt
    Sharma, Harshita
    Bouzid, Kenza
    Thieme, Anja
    Schwaighofer, Anton
    Wetscherek, Maria
    Lungren, Matthew P.
    Nori, Aditya
    Alvarez-Valle, Javier
    Oktay, Ozan
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 15016 - 15027
  • [35] Semantic Mechanical Search with Large Vision and Language Models
    Sharma, Satvik
    Huang, Huang
    Shivakumar, Kaushik
    Chen, Lawrence Yunliang
    Hoque, Ryan
    Ichter, Brian
    Goldberg, Ken
    CONFERENCE ON ROBOT LEARNING, VOL 229, 2023, 229
  • [36] Detecting and Preventing Hallucinations in Large Vision Language Models
    Gunjal, Anisha
    Yin, Jihan
    Bas, Erhan
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 18135 - 18143
  • [37] OphGLM: An ophthalmology large language-and-vision assistant
    Deng, Zhuo
    Gao, Weihao
    Chen, Chucheng
    Niu, Zhiyuan
    Gong, Zheng
    Zhang, Ruiheng
    Cao, Zhenjie
    Li, Fang
    Ma, Zhaoyi
    Wei, Wenbin
    Ma, Lan
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2024, 157
  • [38] Robust Calibration of Large Vision-Language Adapters
    Murugesan, Balamurali
    Silva-Rodriguez, Julio
    Ben Ayed, Ismail
    Dolz, Jose
    COMPUTER VISION - ECCV 2024, PT XXIV, 2025, 15082 : 147 - 165
  • [39] On Scaling up a Multilingual Vision and Language Model
    Chen, Xi
    Djolonga, Josip
    Padlewski, Piotr
    Mustafa, Basil
    Changpinyo, Soravit
    Wu, Jialin
    Ruiz, Carlos Riquelme
    Goodman, Sebastian
    Wang, Xiao
    Tay, Yi
    Shakeri, Siamak
    Dehghani, Mostafa
    Salz, Daniel
    Lucic, Mario
    Tschannen, Michael
    Nagrani, Arsha
    Hu, Hexiang
    Joshi, Mandar
    Pang, Bo
    Montgomery, Ceslee
    Pietrzyk, Paulina
    Ritter, Marvin
    Piergiovanni, A. J.
    Minderer, Matthias
    Pavetic, Filip
    Waters, Austin
    Li, Gang
    Alabdulmohsin, Ibrahim
    Beyer, Lucas
    Amelot, Julien
    Lee, Kenton
    Steiner, Andreas Peter
    Li, Yang
    Keysers, Daniel
    Arnab, Anurag
    Xu, Yuanzhong
    Rong, Keran
    Kolesnikov, Alexander
    Seyedhosseini, Mojtaba
    Angelova, Anelia
    Zhai, Xiaohua
    Houlsby, Neil
    Soricut, Radu
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 14432 - 14444
  • [40] Visual Selective Attention Model for Robot Vision
    Heinen, Milton Roberto
    Engel, Paulo Martins
    2008 5TH LATIN AMERICAN ROBOTICS SYMPOSIUM (LARS 2008), 2008, : 24 - 29