Evaluating the image recognition capabilities of GPT-4V and Gemini Pro in the Japanese national dental examination

被引:0
|
作者
Fukuda, Hikaru
Morishita, Masaki [1 ,2 ,3 ]
Muraoka, Kosuke [3 ]
Yamaguchi, Shino [4 ]
Nakamura, Taiji [5 ]
Yoshioka, Izumi [6 ]
Awano, Shuji [3 ]
Ono, Kentaro [7 ]
机构
[1] Kyushu Dent Univ, Dept Sci Phys Funct, Div Maxillofacial Surg, Kitakyushu, Japan
[2] Kyushu Dent Univ Hosp, Hlth Informat Management Off, Kitakyushu, Japan
[3] Dept Oral Funct, Div Clin Educ Dev & Res, Kyushu, Japan
[4] Kyushu Dent Univ, Sch Oral Hlth Sci, Kitakyushu, Japan
[5] Kyushu Dent Univ, Dept Oral Funct, Div Periodontol, Kitakyushu, Japan
[6] Kyushu Dent Univ, Dept Sci Phys Funct, Div Oral Med, Kitakyushu, Japan
[7] Kyushu Dent Univ, Dept Hlth Promot, Div Physiol, Kitakyushu, Japan
关键词
ChatGPT-4V; Gemini Pro; Japanese national dental examination; Large language models;
D O I
10.1016/j.jds.2024.06.015
中图分类号
R78 [口腔科学];
学科分类号
1003 ;
摘要
Background/purpose: OpenAI's GPT-4V and Google's Gemini Pro, being Large Language Models (LLMs) equipped with image recognition capabilities, have the potential to be utilized in future medical diagnosis and treatment, ands serve as valuable educational support tools for students. This study compared and evaluated the image recognition capabilities of GPT-4V and Gemini Pro using questions from the Japanese National Dental Examination (JNDE) to investigate their potential as educational support tools. Materials and methods: We analyzed 160 questions from the 116th JNDE, administered in March 2023, using ChatGPT-4V, and Gemini Pro, which have image recognition functions. Standardized prompts were used for all LLMs, and statistical analysis was conducted using Fisher's exact test and the Mann-Whitney U test. Results: For the 160 JNDE questions, the accuracy rates of GPT-4V and Gemini Pro were 35.0% and 28.1%, respectively, with GPT-4V being the highest, although not statistically significant. Across dental specialties, the accuracy rates of the GPT-4V were generally higher than those of the Gemini Pro, with some areas showing equal accuracy. Accuracy rates tended to decrease with an increased number of images within a question, suggesting that the number of images influenced the correctness of the responses. Conclusion: The overall superior performance of GPT-4V compared to Gemini Pro may be attributed to the continuous updates in OpenAI's model. This research demonstrates the potential of LLMs as educational support tools in dentistry, while also highlighting areas that (c) 2025 Association for Dental Sciences of the Republic of China. Publishing services by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons. org/licenses/by-nc-nd/4.0/).
引用
收藏
页码:368 / 372
页数:5
相关论文
共 29 条
  • [21] Idea2Img: Iterative Self-refinement with GPT-4V for Automatic Image Design and Generation
    Yang, Zhengyuan
    Wang, Jianfeng
    Li, Linjie
    Lin, Kevin
    Lin, Chung-Ching
    Liu, Zicheng
    Wang, Lijuan
    COMPUTER VISION-ECCV 2024, PT XXXVIII, 2025, 15096 : 167 - 184
  • [22] The Potential of GPT-4 as a Support Tool for Pharmacists: Analytical Study Using the Japanese National Examination for Pharmacists
    Kunitsu, Yuki
    JMIR MEDICAL EDUCATION, 2023, 9
  • [23] Evaluating Bard Gemini Pro and GPT-4 Vision Against Student Performance in Medical Visual Question Answering: Comparative Case Study
    Roos, Jonas
    Martin, Ron
    Kaczmarczyk, Robert
    JMIR FORMATIVE RESEARCH, 2024, 8
  • [24] Performance of ChatGPT-3.5 and ChatGPT-4o in the Japanese National Dental Examination
    Uehara, Osamu
    Morikawa, Tetsuro
    Harada, Fumiya
    Sugiyama, Nodoka
    Matsuki, Yuko
    Hiraki, Daichi
    Sakurai, Hinako
    Kado, Takashi
    Yoshida, Koki
    Murata, Yukie
    Matsuoka, Hirofumi
    Nagasawa, Toshiyuki
    Furuichi, Yasushi
    Abiko, Yoshihiro
    Miura, Hiroko
    JOURNAL OF DENTAL EDUCATION, 2024,
  • [25] Performance of large language models in the National Dental Licensing Examination in China: a comparative analysis of ChatGPT, GPT-4, and New Bing
    Hu, Ziyang
    Xu, Zhe
    Shi, Ping
    Zhang, Dandan
    Yue, Qu
    Zhang, Jiexia
    Lei, Xin
    Lin, Zitong
    INTERNATIONAL JOURNAL OF COMPUTERIZED DENTISTRY, 2024, 27 (04)
  • [26] Artificial intelligence in nurse education - a new sparring partner?: GPT-4 capabilities of formative and summative assessment in National Examination in Anatomy, Physiology, and Biochemistry
    Krumsvik, Rune Johan
    NORDIC JOURNAL OF DIGITAL LITERACY, 2024, 19 (03) : 172 - 186
  • [27] Evaluating the efficacy of leading large language models in the Japanese national dental hygienist examination: A comparative analysis of ChatGPT, Bard, and Bing Chat
    Yamaguchi, Shino
    Morishita, Masaki
    Fukuda, Hikaru
    Muraoka, Kosuke
    Nakamura, Taiji
    Yoshioka, Izumi
    Soh, Inho
    Ono, Kentaro
    Awano, Shuji
    JOURNAL OF DENTAL SCIENCES, 2024, 19 (04) : 2262 - 2267
  • [28] ChatGPT (GPT-4) passed the Japanese National License Examination for Pharmacists in 2022, answering all items including those with diagrams: a descriptive study
    Sato, Hiroyasu
    Ogasawara, Katsuhiko
    JOURNAL OF EDUCATIONAL EVALUATION FOR HEALTH PROFESSIONS, 2024, 21
  • [29] Evaluating Bard Gemini Pro and GPT-4 Vision Against Student Performance in Medical Visual Question Answering: Comparative Case Study (vol 8, e57592, 2025)
    Roos, Jonas
    Martin, Ron
    Kaczmarczyk, Robert
    JMIR FORMATIVE RESEARCH, 2025, 9