Comparing Vision-Capable Models, GPT-4 and Gemini, With GPT-3.5 on Taiwan's Pulmonologist Exam

被引:1
|
作者
Chen, Chih-Hsiung [1 ]
Hsieh, Kuang-Yu [1 ]
Huang, Kuo-En [1 ]
Lai, Hsien-Yun [2 ]
机构
[1] Mennonite Christian Hosp, Dept Crit Care Med, Hualien, Taiwan
[2] Mennonite Christian Hosp, Dept Educ & Res, Hualien, Taiwan
关键词
vision feature; pulmonologist exam; gemini; gpt; large language models; artificial intelligence;
D O I
10.7759/cureus.67641
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Introduction The latest generation of large language models (LLMs) features multimodal capabilities, allowing them to interpret graphics, images, and videos, which are crucial in medical fields. This study investigates the vision capabilities of the next-generation Generative Pre-trained Transformer 4 (GPT-4) and Google's Gemini. Methods To establish a comparative baseline, we used GPT-3.5, a model limited to text processing, and evaluated the performance of both GPT-4 and Gemini on questions from the Taiwan Specialist Board Exams in Pulmonary and Critical Care Medicine. Our dataset included 1,100 questions from 2012 to 2023, with 100 questions per year. Of these, 1,059 were in pure text and 41 were text with images, with the majority in a non-English language and only six in pure English. Results For each annual exam consisting of 100 questions from 2013 to 2023, GPT-4 achieved scores of 66, 69, 51, 64, 72, 64, 66, 64, 63, 68, and 67, respectively. Gemini scored 45, 48, 45, 45, 46, 59, 54, 41, 53, 45, and 45, while GPT-3.5 scored 39, 33, 35, 36, 32, 33, 43, 28, 32, 33, and 36. Conclusions These results demonstrate that the newer LLMs with vision capabilities significantly outperform the text- only model. When a passing score of 60 was set, GPT-4 passed most exams and approached human performance.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Comparing GPT-3.5 and GPT-4 Accuracy and Drift in Radiology Diagnosis Please Cases
    Li, David
    Gupta, Kartik
    Bhaduri, Mousumi
    Sathiadoss, Paul
    Bhatnagar, Sahir
    Chong, Jaron
    RADIOLOGY, 2024, 310 (01)
  • [2] Evaluating Large Language Models for the National Premedical Exam in India: Comparative Analysis of GPT-3.5, GPT-4, and Bard
    Farhat, Faiza
    Chaudhry, Beenish Moalla
    Nadeem, Mohammad
    Sohail, Shahab Saquib
    Madsen, Dag Oivind
    JMIR MEDICAL EDUCATION, 2024, 10
  • [3] Comparing Gemini Pro and GPT-3.5 in Algorithmic Problems
    Souza, Debora
    COMPANION PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, FSE COMPANION 2024, 2024, : 698 - 700
  • [4] GPT-4 in Nuclear Medicine Education: Does It Outperform GPT-3.5?
    Currie, Geoffrey M.
    JOURNAL OF NUCLEAR MEDICINE TECHNOLOGY, 2023, 51 (04) : 314 - 317
  • [5] Correspondence on Chat GPT-4, GPT-3.5 and drug information queries
    Kleebayoon, Amnuay
    Wiwanitkit, Viroj
    JOURNAL OF TELEMEDICINE AND TELECARE, 2023,
  • [6] Examining Lexical Alignment in Human-Agent Conversations with GPT-3.5 and GPT-4 Models
    Wang, Boxuan
    Theune, Mariet
    Srivastava, Sumit
    CHATBOT RESEARCH AND DESIGN, CONVERSATIONS 2023, 2024, 14524 : 94 - 114
  • [7] ChatGPT and Patient Information in Nuclear Medicine: GPT-3.5 Versus GPT-4
    Currie, Geoff
    Robbie, Stephanie
    Tually, Peter
    JOURNAL OF NUCLEAR MEDICINE TECHNOLOGY, 2023, 51 (04) : 307 - 313
  • [8] Assessing the Performance of GPT-3.5 and GPT-4 on the 2023 Japanese Nursing Examination
    Kaneda, Yudai
    Takahashi, Ryo
    Kaneda, Uiri
    Akashima, Shiori
    Okita, Haruna
    Misaki, Sadaya
    Yamashiro, Akimi
    Ozaki, Akihiko
    Tanimoto, Tetsuya
    CUREUS JOURNAL OF MEDICAL SCIENCE, 2023, 15 (08)
  • [9] Evaluation of the performance of GPT-3.5 and GPT-4 on the Polish Medical Final Examination
    Maciej Rosoł
    Jakub S. Gąsior
    Jonasz Łaba
    Kacper Korzeniewski
    Marcel Młyńczak
    Scientific Reports, 13
  • [10] Chat GPT-4 significantly surpasses GPT-3.5 in drug information queries
    He, Na
    Yan, Yingying
    Wu, Ziyang
    Cheng, Yinchu
    Liu, Fang
    Li, Xiaotong
    Zhai, Suodi
    JOURNAL OF TELEMEDICINE AND TELECARE, 2025, 31 (02) : 306 - 308