Evaluating ChatGPT-3.5 and ChatGPT-4.0 Responses on Hyperlipidemia for Patient Education

被引:10
|
作者
Lee, Thomas J. [1 ]
Rao, Abhinav K. [2 ]
Campbell, Daniel J. [3 ]
Radfar, Navid [1 ]
Dayal, Manik [1 ]
Khrais, Ayham [1 ]
机构
[1] Rutgers Univ New Jersey, Med Sch, Dept Med, Newark, NJ 07103 USA
[2] Trident Med Ctr, Dept Med, Charleston, SC USA
[3] Thomas Jefferson Univ Hosp, Dept Otolaryngol Head & Neck Surg, Philadelphia, PA USA
关键词
arrhythmia; patient education; chatgpt; atrial fibrillation; artificial intelligence;
D O I
10.7759/cureus.61067
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Introduction Hyperlipidemia is prevalent worldwide and affects a significant number of US adults. It significantly contributes to ischemic heart disease and millions of deaths annually. With the increasing use of the internet for health information, tools like ChatGPT (OpenAI, San Francisco, CA, USA) have gained traction. ChatGPT version 4.0, launched in March 2023, offers enhanced features over its predecessor but requires a monthly fee. This study compares the accuracy, comprehensibility, and response length of the free and paid versions of ChatGPT for patient education on hyperlipidemia. Materials and methods ChatGPT versions 3.5 and 4.0 were prompted in three different ways and 25 questions from the Cleveland Clinic's frequently asked questions (FAQs) on hyperlipidemia. Prompts included no prompting (Form 1), patient -friendly prompting (Form 2), and physician -level prompting (Form 3). Responses were categorized as incorrect, partially correct, or correct. Additionally, the grade level and word count from each response were recorded for analysis. Results Overall, scoring frequencies for ChatGPT version 3.5 were: five (6.67%) incorrect, 18 partially correct (24%), and 52 (69.33%) correct. Scoring frequencies for ChatGPT version 4.0 were: one (1.33%) incorrect, 18 (24.00%) partially correct, and 56 (74.67%) correct. Correct answers did not significantly differ between ChatGPT version 3.5 and ChatGPT version 4.0 (p = 0.586). ChatGPT version 3.5 had a significantly higher grade reading level than version 4.0 (p = 0.0002). ChatGPT version 3.5 had a significantly higher word count than version 4.0 (p = 0.0073). Discussion There was no significant difference in accuracy between the free and paid versions of hyperlipidemia FAQs. Both versions provided accurate but sometimes partially complete responses. Version 4.0 offered more concise and readable information, aligning with the readability of most online medical resources despite exceeding the National Institutes of Health's (NIH's) recommended eighth -grade reading level. The paid version demonstrated superior adaptability in tailoring responses based on the input. Conclusion Both versions of ChatGPT provide reliable medical information, with the paid version offering more adaptable and readable responses. Healthcare providers can recommend ChatGPT as a source of patient education, regardless of the version used. Future research should explore diverse question formulations and ChatGPT's handling of incorrect information.
引用
收藏
页数:7
相关论文
共 50 条
  • [1] Benchmarking large language models' performances for myopia care: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, and Google Bard
    Lim, Zhi Wei
    Pushpanathan, Krithi
    Yew, Samantha Min Er
    Lai, Yien
    Sun, Chen-Hsin
    Lam, Janice Sing Harn
    Chen, David Ziyou
    Goh, Jocelyn Hui Lin
    Tan, Marcus Chun Jin
    Sheng, Bin
    Cheng, Ching-Yu
    Koh, Victor Teck Chang
    Tham, Yih-Chung
    EBIOMEDICINE, 2023, 95
  • [2] A Performance Evaluation of Large Language Models in Keratoconus: A Comparative Study of ChatGPT-3.5, ChatGPT-4.0, Gemini, Copilot, Chatsonic, and Perplexity
    Reyhan, Ali Hakim
    Mutaf, Cagri
    Uzun, Irfan
    Yuksekyayla, Funda
    JOURNAL OF CLINICAL MEDICINE, 2024, 13 (21)
  • [3] Benchmarking the performance of large language models in uveitis: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, Google Gemini, and Anthropic Claude3
    Zhao, Fang-Fang
    He, Han-Jie
    Liang, Jia-Jian
    Cen, Jingyun
    Wang, Yun
    Lin, Hongjie
    Chen, Feifei
    Li, Tai-Ping
    Yang, Jian-Feng
    Chen, Lan
    Cen, Ling-Ping
    EYE, 2024,
  • [4] Evaluation of large language models in breast cancer clinical scenarios: a comparative analysis based on ChatGPT-3.5, ChatGPT-4.0, and Claude2
    Deng, Linfang
    Wang, Tianyi
    Yangzhang
    Zhai, Zhenhua
    Tao, Wei
    Li, Jincheng
    Zhao, Yi
    Luo, Shaoting
    Xu, Jinjiang
    INTERNATIONAL JOURNAL OF SURGERY, 2024, 110 (04) : 1941 - 1950
  • [5] Comment on: "Benchmarking the performance of large language models in uveitis: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, Google Gemini, and Anthropic Claude3"
    Luo, Xiao
    Tang, Cheng
    Chen, Jin-Jin
    Yuan, Jin
    Huang, Jin-Jin
    Yan, Tao
    EYE, 2025,
  • [6] Correspondence of 'Evaluation of large language models in breast cancer clinical scenarios: a comparative analysis based on ChatGPT-3.5, ChatGPT-4.0, and Claude2'
    Lo, Fangchu
    Au, Kahei
    Yang, Wah
    INTERNATIONAL JOURNAL OF SURGERY, 2024, 110 (09) : 5865 - 5866
  • [7] Reply to 'Comment on: Benchmarking the performance of large language models in uveitis: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, Google Gemini, and Anthropic Claude3'
    Zhao, Fang-Fang
    He, Han-Jie
    Liang, Jia-Jian
    Cen, Ling-Ping
    EYE, 2025,
  • [8] Evaluating the performance of ChatGPT-3.5 and ChatGPT-4 on the Taiwan plastic surgery board examination
    Hsieh, Ching-Hua
    Hsieh, Hsiao-Yun
    Lin, Hui-Ping
    HELIYON, 2024, 10 (14)
  • [9] Evaluating Chatgpt Responses on Atrial Fibrillation for Patient Education
    Lee, Thomas J.
    Campbell, Daniel J.
    Elkattawy, Omar
    Viswanathan, Rohan
    CIRCULATION, 2023, 148
  • [10] Evaluating ChatGPT Responses on Atrial Fibrillation for Patient Education
    Lee, Thomas J.
    Campbell, Daniel J.
    Rao, Abhinav K.
    Hossain, Afif
    Elkattawy, Omar
    Radfar, Navid
    Lee, Paul
    Gardin, Julius M.
    CUREUS JOURNAL OF MEDICAL SCIENCE, 2024, 16 (06)