Comparing Vision-Capable Models, GPT-4 and Gemini, With GPT-3.5 on Taiwan's Pulmonologist Exam

被引:1
|
作者
Chen, Chih-Hsiung [1 ]
Hsieh, Kuang-Yu [1 ]
Huang, Kuo-En [1 ]
Lai, Hsien-Yun [2 ]
机构
[1] Mennonite Christian Hosp, Dept Crit Care Med, Hualien, Taiwan
[2] Mennonite Christian Hosp, Dept Educ & Res, Hualien, Taiwan
关键词
vision feature; pulmonologist exam; gemini; gpt; large language models; artificial intelligence;
D O I
10.7759/cureus.67641
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Introduction The latest generation of large language models (LLMs) features multimodal capabilities, allowing them to interpret graphics, images, and videos, which are crucial in medical fields. This study investigates the vision capabilities of the next-generation Generative Pre-trained Transformer 4 (GPT-4) and Google's Gemini. Methods To establish a comparative baseline, we used GPT-3.5, a model limited to text processing, and evaluated the performance of both GPT-4 and Gemini on questions from the Taiwan Specialist Board Exams in Pulmonary and Critical Care Medicine. Our dataset included 1,100 questions from 2012 to 2023, with 100 questions per year. Of these, 1,059 were in pure text and 41 were text with images, with the majority in a non-English language and only six in pure English. Results For each annual exam consisting of 100 questions from 2013 to 2023, GPT-4 achieved scores of 66, 69, 51, 64, 72, 64, 66, 64, 63, 68, and 67, respectively. Gemini scored 45, 48, 45, 45, 46, 59, 54, 41, 53, 45, and 45, while GPT-3.5 scored 39, 33, 35, 36, 32, 33, 43, 28, 32, 33, and 36. Conclusions These results demonstrate that the newer LLMs with vision capabilities significantly outperform the text- only model. When a passing score of 60 was set, GPT-4 passed most exams and approached human performance.
引用
收藏
页数:9
相关论文
共 50 条
  • [41] ChatGPT as a Source of Information for Bariatric Surgery Patients: a Comparative Analysis of Accuracy and Comprehensiveness Between GPT-4 and GPT-3.5
    Samaan, Jamil S.
    Rajeev, Nithya
    Ng, Wee Han
    Srinivasan, Nitin
    Busam, Jonathan A.
    Yeo, Yee Hui
    Samakar, Kamran
    OBESITY SURGERY, 2024, 34 (05) : 1987 - 1989
  • [42] Large language models and bariatric surgery patient education: a comparative readability analysis of GPT-3.5, GPT-4, Bard, and online institutional resources
    Srinivasan, Nitin
    Samaan, Jamil S.
    Rajeev, Nithya D.
    Kanu, Mmerobasi U.
    Yeo, Yee Hui
    Samakar, Kamran
    SURGICAL ENDOSCOPY AND OTHER INTERVENTIONAL TECHNIQUES, 2024, 38 (05): : 2522 - 2532
  • [43] Large language models and bariatric surgery patient education: a comparative readability analysis of GPT-3.5, GPT-4, Bard, and online institutional resources
    Nitin Srinivasan
    Jamil S. Samaan
    Nithya D. Rajeev
    Mmerobasi U. Kanu
    Yee Hui Yeo
    Kamran Samakar
    Surgical Endoscopy, 2024, 38 : 2522 - 2532
  • [44] Reply to "Performance of GPT-4 Vision on kidney pathology exam questions"
    Miao, Jing
    Thongprayoon, Charat
    Cheungpasitporn, Wisit
    Cornell, Lynn D.
    AMERICAN JOURNAL OF CLINICAL PATHOLOGY, 2024,
  • [45] A Comparison Between GPT-3.5, GPT-4, and GPT-4V: Can the Large Language Model (ChatGPT) Pass the Japanese Board of Orthopaedic Surgery Examination?
    Nakajima, Nozomu
    Fujimori, Takahito
    Furuya, Masayuki
    Kanie, Yuya
    Imai, Hirotatsu
    Kita, Kosuke
    Uemura, Keisuke
    Okada, Seiji
    CUREUS JOURNAL OF MEDICAL SCIENCE, 2024, 16 (03)
  • [46] Assessing readability of explanations and reliability of answers by GPT-3.5 and GPT-4 in non-traumatic spinal cord injury education
    Garcia-Rudolph, Alejandro
    Sanchez-Pinsach, David
    Wright, Mark Andrew
    Opisso, Eloy
    Vidal, Joan
    MEDICAL TEACHER, 2024,
  • [47] The Implementation of Multimodal Large Language Models for Hydrological Applications: A Comparative Study of GPT-4 Vision, Gemini, LLaVa, and Multimodal-GPT
    Kadiyala, Likith Anoop
    Mermer, Omer
    Samuel, Dinesh Jackson
    Sermet, Yusuf
    Demir, Ibrahim
    HYDROLOGY, 2024, 11 (09)
  • [48] Cognitive Network Science Reveals Bias in GPT-3, GPT-3.5 Turbo, and GPT-4 Mirroring Math Anxiety in High-School Students
    Abramski, Katherine
    Citraro, Salvatore
    Lombardi, Luigi
    Rossetti, Giulio
    Stella, Massimo
    BIG DATA AND COGNITIVE COMPUTING, 2023, 7 (03)
  • [49] Exploring new educational approaches in neuropathic pain: assessing accuracy and consistency of artificial intelligence responses from GPT-3.5 and GPT-4
    Garcia-Rudolph, Alejandro
    Sanchez-Pinsach, David
    Opisso, Eloy
    Soler, Maria Dolors
    PAIN MEDICINE, 2024, 26 (01) : 48 - 50
  • [50] RE: Exploring new educational approaches in neuropathic pain: assessing accuracy and consistency of AI responses from GPT-3.5 and GPT-4
    Daungsupawong, Hinpetch
    Wiwanitkit, Viroj
    PAIN MEDICINE, 2024,