Evaluating the image recognition capabilities of GPT-4V and Gemini Pro in the Japanese national dental examination

被引：0

作者：

Fukuda, Hikaru

Morishita, Masaki ^{[1
,2
,3
]}

Muraoka, Kosuke ^{[3
]}

Yamaguchi, Shino ^{[4
]}

Nakamura, Taiji ^{[5
]}

Yoshioka, Izumi ^{[6
]}

Awano, Shuji ^{[3
]}

Ono, Kentaro ^{[7
]}

机构：

[1] Kyushu Dent Univ, Dept Sci Phys Funct, Div Maxillofacial Surg, Kitakyushu, Japan

[2] Kyushu Dent Univ Hosp, Hlth Informat Management Off, Kitakyushu, Japan

[3] Dept Oral Funct, Div Clin Educ Dev & Res, Kyushu, Japan

[4] Kyushu Dent Univ, Sch Oral Hlth Sci, Kitakyushu, Japan

[5] Kyushu Dent Univ, Dept Oral Funct, Div Periodontol, Kitakyushu, Japan

[6] Kyushu Dent Univ, Dept Sci Phys Funct, Div Oral Med, Kitakyushu, Japan

[7] Kyushu Dent Univ, Dept Hlth Promot, Div Physiol, Kitakyushu, Japan

来源：

JOURNAL OF DENTAL SCIENCES | 2025年 / 20卷 / 01期

关键词：

ChatGPT-4V; Gemini Pro; Japanese national dental examination; Large language models;

D O I：

10.1016/j.jds.2024.06.015

中图分类号：

R78 [口腔科学];

学科分类号：

1003 ;

摘要：

Background/purpose: OpenAI's GPT-4V and Google's Gemini Pro, being Large Language Models (LLMs) equipped with image recognition capabilities, have the potential to be utilized in future medical diagnosis and treatment, ands serve as valuable educational support tools for students. This study compared and evaluated the image recognition capabilities of GPT-4V and Gemini Pro using questions from the Japanese National Dental Examination (JNDE) to investigate their potential as educational support tools. Materials and methods: We analyzed 160 questions from the 116th JNDE, administered in March 2023, using ChatGPT-4V, and Gemini Pro, which have image recognition functions. Standardized prompts were used for all LLMs, and statistical analysis was conducted using Fisher's exact test and the Mann-Whitney U test. Results: For the 160 JNDE questions, the accuracy rates of GPT-4V and Gemini Pro were 35.0% and 28.1%, respectively, with GPT-4V being the highest, although not statistically significant. Across dental specialties, the accuracy rates of the GPT-4V were generally higher than those of the Gemini Pro, with some areas showing equal accuracy. Accuracy rates tended to decrease with an increased number of images within a question, suggesting that the number of images influenced the correctness of the responses. Conclusion: The overall superior performance of GPT-4V compared to Gemini Pro may be attributed to the continuous updates in OpenAI's model. This research demonstrates the potential of LLMs as educational support tools in dentistry, while also highlighting areas that (c) 2025 Association for Dental Sciences of the Republic of China. Publishing services by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons. org/licenses/by-nc-nd/4.0/).

引用

页码：368 / 372

页数：5

共 29 条

[21] Idea2Img: Iterative Self-refinement with GPT-4V for Automatic Image Design and Generation
Yang, Zhengyuan
Wang, Jianfeng
Li, Linjie
Lin, Kevin
Lin, Chung-Ching
Liu, Zicheng
Wang, Lijuan
COMPUTER VISION-ECCV 2024, PT XXXVIII, 2025, 15096 : 167 - 184
[22] The Potential of GPT-4 as a Support Tool for Pharmacists: Analytical Study Using the Japanese National Examination for Pharmacists
Kunitsu, Yuki
JMIR MEDICAL EDUCATION, 2023, 9
[23] Evaluating Bard Gemini Pro and GPT-4 Vision Against Student Performance in Medical Visual Question Answering: Comparative Case Study
Roos, Jonas
Martin, Ron
Kaczmarczyk, Robert
JMIR FORMATIVE RESEARCH, 2024, 8
[24] Performance of ChatGPT-3.5 and ChatGPT-4o in the Japanese National Dental Examination
Uehara, Osamu
Morikawa, Tetsuro
Harada, Fumiya
Sugiyama, Nodoka
Matsuki, Yuko
Hiraki, Daichi
Sakurai, Hinako
Kado, Takashi
Yoshida, Koki
Murata, Yukie
Matsuoka, Hirofumi
Nagasawa, Toshiyuki
Furuichi, Yasushi
Abiko, Yoshihiro
Miura, Hiroko
JOURNAL OF DENTAL EDUCATION, 2024,
[25] Performance of large language models in the National Dental Licensing Examination in China: a comparative analysis of ChatGPT, GPT-4, and New Bing
Hu, Ziyang
Xu, Zhe
Shi, Ping
Zhang, Dandan
Yue, Qu
Zhang, Jiexia
Lei, Xin
Lin, Zitong
INTERNATIONAL JOURNAL OF COMPUTERIZED DENTISTRY, 2024, 27 (04)
[26] Artificial intelligence in nurse education - a new sparring partner?: GPT-4 capabilities of formative and summative assessment in National Examination in Anatomy, Physiology, and Biochemistry
Krumsvik, Rune Johan
NORDIC JOURNAL OF DIGITAL LITERACY, 2024, 19 (03) : 172 - 186
[27] Evaluating the efficacy of leading large language models in the Japanese national dental hygienist examination: A comparative analysis of ChatGPT, Bard, and Bing Chat
Yamaguchi, Shino
Morishita, Masaki
Fukuda, Hikaru
Muraoka, Kosuke
Nakamura, Taiji
Yoshioka, Izumi
Soh, Inho
Ono, Kentaro
Awano, Shuji
JOURNAL OF DENTAL SCIENCES, 2024, 19 (04) : 2262 - 2267
[28] ChatGPT (GPT-4) passed the Japanese National License Examination for Pharmacists in 2022, answering all items including those with diagrams: a descriptive study
Sato, Hiroyasu
Ogasawara, Katsuhiko
JOURNAL OF EDUCATIONAL EVALUATION FOR HEALTH PROFESSIONS, 2024, 21
[29] Evaluating Bard Gemini Pro and GPT-4 Vision Against Student Performance in Medical Visual Question Answering: Comparative Case Study (vol 8, e57592, 2025)
Roos, Jonas
Martin, Ron
Kaczmarczyk, Robert
JMIR FORMATIVE RESEARCH, 2025, 9

← 1 2 3 →