Evaluation of Responses to Questions About Keratoconus Using ChatGPT-4.0, Google Gemini and Microsoft Copilot: A Comparative Study of Large Language Models on Keratoconus

被引：0

作者：

Demir, Suleyman ^{[1
]}

机构：

[1] Adana 5 Ocak State Hosp, Dept Ophthalmol, Adana, Turkiye

来源：

EYE & CONTACT LENS-SCIENCE AND CLINICAL PRACTICE | 2025年 / 51卷 / 03期

关键词：

ChatGPT-4.0; Google Gemini; Microsoft Copilot; Artificial intelligence; Keratoconus; INFLAMMATORY MOLECULES; CORNEAL;

D O I：

10.1097/ICL.0000000000001158

中图分类号：

R77 [眼科学];

学科分类号：

100212 ;

摘要：

Objectives: Large language models (LLMs) are increasingly being used today and are becoming increasingly important for providing accurate clinical information to patients and physicians. This study aimed to evaluate the effectiveness of generative pre-trained transforme-4.0 (ChatGPT-4.0), Google Gemini, and Microsoft Copilot LLMs in responding to patient questions regarding keratoconus. Methods: The LLMs' responses to the 25 most common questions about keratoconus asked by real-life patients were blindly rated by two ophthalmologists using a 5-point Likert scale. In addition, the DISCERN scale was used to evaluate the responses of the language models in terms of reliability, and the Flesch reading ease and Flesch-Kincaid grade level indices were used to determine readability. Results: ChatGPT-4.0 provided more detailed and accurate answers to patients' questions about keratoconus than Google Gemini and Microsoft Copilot, with 92% of the answers belonging to the "agree" or "strongly agree" categories. Significant differences were observed between all three LLMs on the Likert scale (P<0.001). Conclusions: Although the answers of ChatGPT-4.0 to questions about keratoconus were more complex for patients than those of other language programs, the information provided was reliable and accurate.

引用

页码：e107 / e111

页数：5

共 29 条

[1] A Performance Evaluation of Large Language Models in Keratoconus: A Comparative Study of ChatGPT-3.5, ChatGPT-4.0, Gemini, Copilot, Chatsonic, and Perplexity
Reyhan, Ali Hakim
Mutaf, Cagri
Uzun, Irfan
Yuksekyayla, Funda
JOURNAL OF CLINICAL MEDICINE, 2024, 13 (21)
[2] Benchmarking the performance of large language models in uveitis: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, Google Gemini, and Anthropic Claude3
Zhao, Fang-Fang
He, Han-Jie
Liang, Jia-Jian
Cen, Jingyun
Wang, Yun
Lin, Hongjie
Chen, Feifei
Li, Tai-Ping
Yang, Jian-Feng
Chen, Lan
Cen, Ling-Ping
EYE, 2024,
[3] Assessing the Responses of Large Language Models (ChatGPT-4, Gemini, and Microsoft Copilot) to Frequently Asked Questions in Breast Imaging: A Study on Readability and Accuracy
Tepe, Murat
Emekli, Emre
CUREUS JOURNAL OF MEDICAL SCIENCE, 2024, 16 (05)
[4] Comment on: "Benchmarking the performance of large language models in uveitis: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, Google Gemini, and Anthropic Claude3"
Luo, Xiao
Tang, Cheng
Chen, Jin-Jin
Yuan, Jin
Huang, Jin-Jin
Yan, Tao
EYE, 2025,
[5] Reply to 'Comment on: Benchmarking the performance of large language models in uveitis: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, Google Gemini, and Anthropic Claude3'
Zhao, Fang-Fang
He, Han-Jie
Liang, Jia-Jian
Cen, Ling-Ping
EYE, 2025,
[6] Benchmarking large language models' performances for myopia care: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, and Google Bard
Lim, Zhi Wei
Pushpanathan, Krithi
Yew, Samantha Min Er
Lai, Yien
Sun, Chen-Hsin
Lam, Janice Sing Harn
Chen, David Ziyou
Goh, Jocelyn Hui Lin
Tan, Marcus Chun Jin
Sheng, Bin
Cheng, Ching-Yu
Koh, Victor Teck Chang
Tham, Yih-Chung
EBIOMEDICINE, 2023, 95
[7] Comparative Evaluation of AI Models Such as ChatGPT 3.5, ChatGPT 4.0, and Google Gemini in Neuroradiology Diagnostics
Gupta, Rishi
Hamid, Abdullgabbar M.
Jhaveri, Miral
Patel, Niki
Suthar, Pokhraj P.
CUREUS JOURNAL OF MEDICAL SCIENCE, 2024, 16 (08)
[8] Using large language models (ChatGPT, Copilot, PaLM, Bard, and Gemini) in Gross Anatomy course: Comparative analysis
Mavrych, Volodymyr
Ganguly, Paul
Bolgova, Olena
CLINICAL ANATOMY, 2025, 38 (02) : 200 - 210
[9] Performance evaluation of ChatGPT-4.0 and Gemini on image-based neurosurgery board practice questions: A comparative analysis
Mcnulty, Alana M.
Valluri, Harshitha
Gajjar, Avi A.
Custozzo, Amanda
Field, Nicholas C.
Paul, Alexandra R.
JOURNAL OF CLINICAL NEUROSCIENCE, 2025, 134
[10] Evaluating the reliability of the responses of large language models to keratoconus-related questions
Kayabasi, Mustafa
Koksaldi, Seher
Engin, Ceren Durmaz
CLINICAL AND EXPERIMENTAL OPTOMETRY, 2024,

← 1 2 3 →