Role of visual information in multimodal large language model performance: an evaluation using the Japanese nuclear medicine board examination

被引:0
|
作者
Watanabe, Takashi [1 ]
Baba, Akira [1 ]
Fukuda, Takeshi [1 ]
Watanabe, Ken [1 ]
Woo, Jun [1 ]
Ojiri, Hiroya [1 ]
机构
[1] Jikei Univ, Sch Med, Dept Radiol, 3-25-8 Nishi Shimbashi,Minato Ku, Tokyo 1058461, Japan
关键词
Large language models; Visual information; Japanese nuclear medicine board examination;
D O I
10.1007/s12149-024-01992-8
中图分类号
R8 [特种医学]; R445 [影像诊断学];
学科分类号
1002 ; 100207 ; 1009 ;
摘要
ObjectivesThis study aimed to assess the performance of state-of-the-art multimodal large language models (LLMs), specifically GPT-4o, Claude 3 Opus, and Gemini 1.5 Pro, on Japanese Nuclear Medicine Board Examination (JNMBE) questions and to evaluate the influence of visual information on the decision-making process.MethodsThis study utilized 92 questions with images from the JNMBE (2019-2023). The LLMs' responses were assessed under two conditions: providing both text and images and providing only text. Each model answered all questions thrice, and the most frequent answer choice was considered the final answer. The accuracy and agreement rates among the model answers were evaluated using statistical tests.ResultsGPT-4o, Claude 3 Opus, and Gemini 1.5 Pro exhibited no significant differences in terms of accuracy between the text-and-image and text-only conditions. GPT-4o and Claude 3 Opus demonstrated accuracies of 54.3% (95% CI: 44.2%-64.1%) each when provided with both text and images; however, they selected the same options as in the text-only condition for 71.7% of the questions. Gemini 1.5 Pro performed significantly worse than GPT-4o under text and image conditions. The agreement rates among the model answers ranged from weak to moderate.ConclusionThe influence of images on decision-making in nuclear medicine is limited to the latest multimodal LLMs, and their diagnostic ability in this highly specialized field remains insufficient. Improving the utilization of image information and enhancing the answer reproducibility are crucial for the effective application of LLMs in nuclear medicine education and practice. Further advancements in these areas are necessary to harness the potential of LLMs as assistants in nuclear medicine diagnosis.
引用
收藏
页码:217 / 224
页数:8
相关论文
共 10 条
  • [1] Performance of a Large Language Model on Japanese Emergency Medicine Board Certification Examinations
    Igarashi, Yutaka
    Nakahara, Kyoichi
    Norii, Tatsuya
    Miyake, Nodoka
    Tagami, Takashi
    Yokobori, Shoji
    JOURNAL OF NIPPON MEDICAL SCHOOL, 2024, 91 (02) : 155 - 161
  • [2] The performance of the multimodal large language model GPT-4 on the European board of radiology examination sample test
    Besler, Muhammed Said
    JAPANESE JOURNAL OF RADIOLOGY, 2024, 42 (08) : 927 - 927
  • [3] Performance of a Large Language Model on Practice Questions for the Neonatal Board Examination
    Beam, Kristyn
    Sharma, Puneet
    Kumar, Bhawesh
    Wang, Cindy
    Brodsky, Dara
    Martin, Camilia R.
    Beam, Andrew
    JAMA PEDIATRICS, 2023, 177 (09) : 977 - 979
  • [4] The Accuracy of the Multimodal Large Language Model GPT-4 on Sample Questions From the Interventional Radiology Board Examination Response
    Ariyaratne, Sisith
    Jenko, Nathan
    Davies, A. Mark
    Iyengar, Karthikeyan P.
    Botchu, Rajesh
    ACADEMIC RADIOLOGY, 2024, 31 (08) : 3477 - 3477
  • [5] Response to Letter to the Editor from Muhammed Said Beşler et al.: "The Performance of the Multimodal Large Language Model GPT-4 on the European Board of Radiology Examination Sample Test"
    Nakaura, Takeshi
    Hirai, Toshinori
    JAPANESE JOURNAL OF RADIOLOGY, 2024, 42 (08) : 928 - 928
  • [6] Performance evaluation of large-scale object recognition system using bag-of-visual words model
    Min-Uk Kim
    Kyoungro Yoon
    Multimedia Tools and Applications, 2015, 74 : 2499 - 2517
  • [7] Performance evaluation of large-scale object recognition system using bag-of-visual words model
    Kim, Min-Uk
    Yoon, Kyoungro
    MULTIMEDIA TOOLS AND APPLICATIONS, 2015, 74 (07) : 2499 - 2517
  • [8] Evaluation of Prompts to Simplify Cardiovascular Disease Information Generated Using a Large Language Model: Cross-Sectional Study
    Mishra, Vishala
    Sarraju, Ashish
    Kalwani, Neil M.
    Dexter, Joseph P.
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2024, 26
  • [9] A Comparison Between GPT-3.5, GPT-4, and GPT-4V: Can the Large Language Model (ChatGPT) Pass the Japanese Board of Orthopaedic Surgery Examination?
    Nakajima, Nozomu
    Fujimori, Takahito
    Furuya, Masayuki
    Kanie, Yuya
    Imai, Hirotatsu
    Kita, Kosuke
    Uemura, Keisuke
    Okada, Seiji
    CUREUS JOURNAL OF MEDICAL SCIENCE, 2024, 16 (03)
  • [10] A Conceptual Framework for a Latest Information-Maintaining Method Using Retrieval-Augmented Generation and a Large Language Model in Smart Manufacturing: Theoretical Approach and Performance Analysis
    Choi, Hangseo
    Jeong, Jongpil
    MACHINES, 2025, 13 (02)