Evaluating AI-generated patient education materials for spinal surgeries: Comparative analysis of readability and DISCERN quality across ChatGPT and deepseek models

被引:0
|
作者
Zhou, Mi [1 ]
Pan, Yun [2 ]
Zhang, Yuye [3 ]
Song, Xiaomei [4 ]
Zhou, Youbin [5 ]
机构
[1] Univ South Australia, Allied Hlth & Human Performance, Adelaide, Australia
[2] Soochow Univ, Affiliated Hosp 2, Dept Cardiovasc Med, Suzhou, Jiangsu, Peoples R China
[3] Soochow Univ, Affiliated Hosp 2, Dept Orthopaed, Suzhou, Jiangsu, Peoples R China
[4] Soochow Univ, Affiliated Hosp 2, Dept Nursing, Suzhou, Jiangsu, Peoples R China
[5] Jinling Inst Technol, Coll Intelligent Sci & Control Engn, Nanjing, Peoples R China
关键词
AI-generated health information; Spinal surgery education; Patient health literacy; Readability; INFORMATION; INTERNET;
D O I
10.1016/j.ijmedinf.2025.105871
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Background: Access to patient-centered health information is essential for informed decision-making. However, online medical resources vary in quality and often fail to accommodate differing degrees of health literacy. This issue is particularly evident in surgical contexts, where complex terminology obstructs patient comprehension. With the increasing reliance on AI models for supplementary medical information, the reliability and readability of AI-generated content require thorough evaluation. Objective: This study aimed to evaluate four natural language processing models-ChatGPT-4o, ChatGPT-o3 mini, DeepSeek-V3, and DeepSeek-R1-in generating patient education materials for three common spinal surgeries: lumbar discectomy, spinal fusion, and decompressive laminectomy. Information quality was evaluated using the DISCERN score, and readability was assessed through Flesch-Kincaid indices. Results: DeepSeek-R1 produced the most readable responses, with Flesch-Kincaid Grade Level (FKGL) scores ranging from 7.2 to 9.0, succeeded by ChatGPT-4o. In contrast, ChatGPT-o3 exhibited the lowest readability (FKGL > 10.4). The DISCERN scores for all AI models were below 60, classifying the information quality as "fair," primarily due to insufficient cited references. Conclusion: All models achieved merely a "fair" quality rating, underscoring the necessity for improvements in citation practices, and personalization. Nonetheless, DeepSeek-R1 and ChatGPT-4o generated more readable surgical information than ChatGPT-o3. Given that enhanced readability can improve patient engagement, reduce anxiety, and contribute to better surgical outcomes, these two models should be prioritized for assisting patients in the clinical. Limitation & Future direction: This study is limited by the rapid evolution of AI models, its exclusive focus on spinal surgery education, and the absence of real-world patient feedback, which may affect the generalizability and long-term applicability of the findings. Future research ought to explore interactive, multimodal approaches and incorporate patient feedback to ensure that AI-generated health information is accurate, accessible, and facilitates informed healthcare decisions.
引用
收藏
页数:5
相关论文
共 7 条
  • [1] Assessing Readability of Patient Education Materials: A Comparative Study of ASRS Resources and AI-Generated Content by Popular Large Language Models (ChatGPT 4.0 and Google Bard)
    Shi, Michael
    Hanna, Jovana
    Clavell, Christine
    Eid, Kevin
    Eid, Alen
    Ghorayeb, Ghassan
    John Nguyen
    INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2024, 65 (07)
  • [2] Exploring the Potential of ChatGPT in Nursing Education: A Comparative Analysis of Human and AI-Generated NCLEX Questions
    Cox, Rachel
    Hunt, Karen
    Hill, Rebecca
    NURSING RESEARCH, 2024, 73 (03) : E75 - E75
  • [3] Optimizing Ophthalmology Patient Education via ChatBot-Generated Materials: Readability Analysis of AI-Generated Patient Education Materials and The American Society of Ophthalmic Plastic and Reconstructive Surgery Patient Brochures
    Eid, Kevin
    Eid, Alen
    Wang, Diane
    Raiker, Rahul S.
    Chen, Stephen
    Nguyen, John
    OPHTHALMIC PLASTIC AND RECONSTRUCTIVE SURGERY, 2024, 40 (02): : 212 - 216
  • [4] The Quality of AI-Generated Dental Caries Multiple Choice Questions: A Comparative Analysis of ChatGPT and Google Bard Language Models
    Ahmed, Walaa Magdy
    Azhari, Amr Ahmed
    Alfaraj, Amal
    Alhamadani, Abdulaziz
    Zhang, Min
    Lu, Chang-Tien
    HELIYON, 2024, 10 (07)
  • [5] Can artificial intelligence (AI) educate your patient? A study to assess overall readability and pharmacists' perception of AI-generated patient education materials
    Armstrong, Drew
    Paul, Caroline
    McGlaughlin, Brent
    Hill, David
    JOURNAL OF THE AMERICAN COLLEGE OF CLINICAL PHARMACY, 2024, 7 (08): : 803 - 808
  • [6] Evaluating the Readability, Quality, and Reliability of Online Patient Education Materials on Spinal Cord Stimulation
    Gunduz, Muhammet Enes
    Matis, Georgios K.
    Ozduran, Erkan
    Hanci, Volkan
    TURKISH NEUROSURGERY, 2024, 34 (04) : 588 - 599
  • [7] Assessing the Readability, Reliability, and Quality of AI-Modified and Generated Patient Education Materials for Endoscopic Skull Base Surgery
    Warn, Michael
    Meller, Leo L. T.
    Chan, Daniella
    Torabi, Sina J.
    Bitner, Benjamin F.
    Tajudeen, Bobby A.
    Kuan, Edward C.
    AMERICAN JOURNAL OF RHINOLOGY & ALLERGY, 2024, 38 (06) : 396 - 402