Analysis of Responses of GPT-4 V to the Japanese National Clinical Engineer Licensing Examination

被引:1
|
作者
Ishida, Kai [1 ]
Arisaka, Naoya [2 ]
Fujii, Kiyotaka [3 ]
机构
[1] Shonan Inst Technol, Fac Engn, Dept Mat & Human Environm Sci, Fujisawa, Japan
[2] Kitasato Univ, Sch Allied Hlth Sci, Dept Med Informat, Sagamihara, Kanagawa, Japan
[3] Kitasato Univ, Sch Allied Hlth Sci, Dept Clin Engn, Sagamihara, Kanagawa, Japan
关键词
ChatGPT; Multimodal large language models; Artificial intelligence; Clinical engineer; Licensing examination; Medical education; CHATGPT;
D O I
10.1007/s10916-024-02103-w
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Chat Generative Pretrained Transformer (ChatGPT; OpenAI) is a state-of-the-art large language model that can simulate human-like conversations based on user input. We evaluated the performance of GPT-4 V in the Japanese National Clinical Engineer Licensing Examination using 2,155 questions from 2012 to 2023. The average correct answer rate for all questions was 86.0%. In particular, clinical medicine, basic medicine, medical materials, biological properties, and mechanical engineering achieved a correct response rate of >= 90%. Conversely, medical device safety management, electrical and electronic engineering, and extracorporeal circulation obtained low correct answer rates ranging from 64.8% to 76.5%. The correct answer rates for questions that included figures/tables, required numerical calculation, figure/table boolean AND calculation, and knowledge of Japanese Industrial Standards were 55.2%, 85.8%, 64.2% and 31.0%, respectively. The reason for the low correct answer rates is that ChatGPT lacked recognition of the images and knowledge of standards and laws. This study concludes that careful attention is required when using ChatGPT because several of its explanations lack the correct description.
引用
收藏
页数:9
相关论文
共 50 条
  • [41] Performance of ChatGPT and GPT-4 on Polish National Specialty Exam (NSE) in Ophthalmology
    Ciekalski, Marcin
    Laskowski, Maciej
    Koperczak, Agnieszka
    Smierciak, Maria
    Sirek, Sebastian
    POSTEPY HIGIENY I MEDYCYNY DOSWIADCZALNEJ, 2024, 78 (01): : 111 - 116
  • [42] Validation of GPT-4 for clinical event classification: A comparative analysis with ICD codes and human reviewers
    Wang, Yichen
    Huang, Yuting
    Nimma, Induja R.
    Pang, Songhan
    Pang, Maoyin
    Cui, Tao
    Kumbhari, Vivek
    JOURNAL OF GASTROENTEROLOGY AND HEPATOLOGY, 2024, 39 (08) : 1535 - 1543
  • [43] Diagnostic accuracy of GPT-4 on common clinical scenarios and challenging cases
    Rutledge, Geoffrey W.
    LEARNING HEALTH SYSTEMS, 2024, 8 (03):
  • [44] VALIDATION OF GPT-4 FOR CLINICAL EVENT CLASSIFICATION: A COMPARATIVE ANALYSIS WITH ICD CODES AND HUMAN REVIEWERS
    Huang, Yuting
    Wang, Yichen
    Pang, Songhan
    Nimma, Induja R.
    Pang, Maoyin
    Tao, Cui
    Kumbhari, Vivek
    GASTROENTEROLOGY, 2024, 166 (05) : S68 - S68
  • [45] Beyond Accuracy: Investigating Error Types in GPT-4 Responses to USMLE Questions
    Roy, Soumyadeep
    Khatua, Aparup
    Ghoochani, Fatemeh
    Hadler, Uwe
    Nejdl, Wolfgang
    Ganguly, Niloy
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 1073 - 1082
  • [46] Letter to the editor response to “ChatGPT, GPT-4, and bard and official board examination: comment”
    Ayaka Harigai
    Yoshitaka Toyama
    Kei Takase
    Japanese Journal of Radiology, 2024, 42 : 214 - 215
  • [47] Comparing the performance of ChatGPT GPT-4, Bard, and Llama-2 in the Taiwan Psychiatric Licensing Examination and in differential diagnosis with multi-center psychiatrists
    Li, Dian-Jeng
    Kao, Yu-Chen
    Tsai, Shih-Jen
    Bai, Ya-Mei
    Yeh, Ta-Chuan
    Chu, Che-Sheng
    Hsu, Chih-Wei
    Cheng, Szu-Wei
    Hsu, Tien-Wei
    Liang, Chih-Sung
    Su, Kuan-Pin
    PSYCHIATRY AND CLINICAL NEUROSCIENCES, 2024, 78 (06) : 347 - 352
  • [48] Letter to the editor response to "ChatGPT, GPT-4, and bard and official board examination: comment"
    Harigai, Ayaka
    Toyama, Yoshitaka
    Takase, Kei
    JAPANESE JOURNAL OF RADIOLOGY, 2024, 42 (02) : 214 - 215
  • [49] Custom GPTs Enhancing Performance and Evidence Compared with GPT-3.5, GPT-4, and GPT-4o? A Study on the Emergency Medicine Specialist Examination
    Liu, Chiu-Liang
    Ho, Chien-Ta
    Wu, Tzu-Chi
    HEALTHCARE, 2024, 12 (17)
  • [50] Evaluation of Reliability, Repeatability, Robustness, and Confidence of GPT-3.5 and GPT-4 on a Radiology Board-style Examination
    Krishna, Satheesh
    Bhambra, Nishaant
    Bleakney, Robert
    Bhayana, Rajesh
    RADIOLOGY, 2024, 311 (02)