Assessing the performance of large language models (GPT-3.5 and GPT-4) and accurate clinical information for pediatric nephrology

被引:0
|
作者
Sav, Nadide Melike [1 ]
机构
[1] Duzce Univ, Dept Pediat Nephrol, Duzce, Turkiye
关键词
Artificial intelligence; ChatGPT; Clinical decision support systems; Cohen's d; Cronbach's alpha; Pediatric nephrology;
D O I
10.1007/s00467-025-06723-3
中图分类号
R72 [儿科学];
学科分类号
100202 ;
摘要
Background Artificial intelligence (AI) has emerged as a transformative tool in healthcare, offering significant advancements in providing accurate clinical information. However, the performance and applicability of AI models in specialized fields such as pediatric nephrology remain underexplored. This study is aimed at evaluating the ability of two AI-based language models, GPT-3.5 and GPT-4, to provide accurate and reliable clinical information in pediatric nephrology. The models were evaluated on four criteria: accuracy, scope, patient friendliness, and clinical applicability. Methods Forty pediatric nephrology specialists with >= 5 years of experience rated GPT-3.5 and GPT-4 responses to 10 clinical questions using a 1-5 scale via Google Forms. Ethical approval was obtained, and informed consent was secured from all participants. Results Both GPT-3.5 and GPT-4 demonstrated comparable performance across all criteria, with no statistically significant differences observed (p > 0.05). GPT-4 exhibited slightly higher mean scores in all parameters, but the differences were negligible (Cohen's d < 0.1 for all criteria). Reliability analysis revealed low internal consistency for both models (Cronbach's alpha ranged between 0.019 and 0.162). Correlation analysis indicated no significant relationship between participants' years of professional experience and their evaluations of GPT-3.5 (correlation coefficients ranged from - 0.026 to 0.074). Conclusions While GPT-3.5 and GPT-4 provided a foundational level of clinical information support, neither model exhibited superior performance in addressing the unique challenges of pediatric nephrology. The findings highlight the need for domain-specific training and integration of updated clinical guidelines to enhance the applicability and reliability of AI models in specialized fields. This study underscores the potential of AI in pediatric nephrology while emphasizing the importance of human oversight and the need for further refinements in AI applications.
引用
收藏
页数:7
相关论文
共 50 条
  • [1] Assessing the Performance of GPT-3.5 and GPT-4 on the 2023 Japanese Nursing Examination
    Kaneda, Yudai
    Takahashi, Ryo
    Kaneda, Uiri
    Akashima, Shiori
    Okita, Haruna
    Misaki, Sadaya
    Yamashiro, Akimi
    Ozaki, Akihiko
    Tanimoto, Tetsuya
    CUREUS JOURNAL OF MEDICAL SCIENCE, 2023, 15 (08)
  • [2] Correspondence on Chat GPT-4, GPT-3.5 and drug information queries
    Kleebayoon, Amnuay
    Wiwanitkit, Viroj
    JOURNAL OF TELEMEDICINE AND TELECARE, 2023,
  • [3] Performance of GPT-4 and GPT-3.5 in generating accurate and comprehensive diagnoses across medical subspecialties
    Luk, Dik Wai Anderson
    Ip, Whitney Chin Tung
    Shea, Yat-fung
    JOURNAL OF THE CHINESE MEDICAL ASSOCIATION, 2024, 87 (03) : 259 - 260
  • [4] ChatGPT and Patient Information in Nuclear Medicine: GPT-3.5 Versus GPT-4
    Currie, Geoff
    Robbie, Stephanie
    Tually, Peter
    JOURNAL OF NUCLEAR MEDICINE TECHNOLOGY, 2023, 51 (04) : 307 - 313
  • [5] Chat GPT-4 significantly surpasses GPT-3.5 in drug information queries
    He, Na
    Yan, Yingying
    Wu, Ziyang
    Cheng, Yinchu
    Liu, Fang
    Li, Xiaotong
    Zhai, Suodi
    JOURNAL OF TELEMEDICINE AND TELECARE, 2025, 31 (02) : 306 - 308
  • [6] Evaluating Large Language Models for the National Premedical Exam in India: Comparative Analysis of GPT-3.5, GPT-4, and Bard
    Farhat, Faiza
    Chaudhry, Beenish Moalla
    Nadeem, Mohammad
    Sohail, Shahab Saquib
    Madsen, Dag Oivind
    JMIR MEDICAL EDUCATION, 2024, 10
  • [7] Assessing Generative Pretrained Transformers (GPT) in Clinical Decision-Making: Comparative Analysis of GPT-3.5 and GPT-4
    Lahat, Adi
    Sharif, Kassem
    Zoabi, Narmin
    Patt, Yonatan Shneor
    Sharif, Yousra
    Fisher, Lior
    Shani, Uria
    Arow, Mohamad
    Levin, Roni
    Klang, Eyal
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2024, 26
  • [8] Evaluation of the performance of GPT-3.5 and GPT-4 on the Polish Medical Final Examination
    Maciej Rosoł
    Jakub S. Gąsior
    Jonasz Łaba
    Kacper Korzeniewski
    Marcel Młyńczak
    Scientific Reports, 13
  • [9] Evaluation of the performance of GPT-3.5 and GPT-4 on the Polish Medical Final Examination
    Rosol, Maciej
    Gasior, Jakub S.
    Laba, Jonasz
    Korzeniewski, Kacper
    Mlynczak, Marcel
    SCIENTIFIC REPORTS, 2023, 13 (01)
  • [10] Performance of GPT-3.5 and GPT-4 on the Korean Pharmacist Licensing Examination: Comparison Study
    Jin, Hye Kyung
    Kim, Eunyoung
    JMIR MEDICAL EDUCATION, 2024, 10