Evaluating ChatGPT Responses on Thyroid Nodules for Patient Education

被引:29
|
作者
Campbell, Daniel J. [1 ,2 ]
Estephan, Leonard E. [1 ]
Sina, Elliott M. [1 ]
Mastrolonardo, Eric V. [1 ]
Alapati, Rahul [1 ]
Amin, Dev R. [1 ]
Cottrill, Elizabeth E. [1 ]
机构
[1] Thomas Jefferson Univ Hosp, Dept Otolaryngol Head & Neck Surg, Philadelphia, PA USA
[2] Thomas Jefferson Univ Hosp, Dept Otolaryngol Head & Neck Surg, 925 Chestnut St,Floor 6, Philadelphia, PA 19107 USA
关键词
thyroid nodule; artificial intelligence; patient education; ChatGPT;
D O I
10.1089/thy.2023.0491
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Background: ChatGPT, an artificial intelligence (AI) chatbot, is the fastest growing consumer application in history. Given recent trends identifying increasing patient use of Internet sources for self-education, we seek to evaluate the quality of ChatGPT-generated responses for patient education on thyroid nodules. Methods: ChatGPT was queried 4 times with 30 identical questions. Queries differed by initial chatbot prompting: no prompting, patient-friendly prompting, 8th-grade level prompting, and prompting for references. Answers were scored on a hierarchical score: incorrect, partially correct, correct, or correct with references. Proportions of responses at incremental score thresholds were compared by prompt type using chi-squared analysis. Flesch-Kincaid grade level was calculated for each answer. The relationship between prompt type and grade level was assessed using analysis of variance. References provided within ChatGPT answers were totaled and analyzed for veracity. Results: Across all prompts (n = 120 questions), 83 answers (69.2%) were at least correct. Proportions of responses that were at least partially correct (p = 0.795) and correct (p = 0.402) did not differ by prompt; responses that were correct with references did (p < 0.0001). Responses from 8th-grade level prompting were the lowest mean grade level (13.43 +/- 2.86) and were significantly lower than no prompting (14.97 +/- 2.01, p = 0.01) and prompting for references (16.43 +/- 2.05, p < 0.0001). Prompting for references generated 80/80 (100%) of referenced medical publications within answers. Seventy references (87.5%) were legitimate citations, and 58/80 (72.5%) provided accurately reported information from the referenced publication. Conclusion: ChatGPT overall provides appropriate answers to most questions on thyroid nodules regardless of prompting. Despite targeted prompting strategies, ChatGPT reliably generates responses corresponding to grade levels well-above accepted recommendations for presenting medical information to patients. Significant rates of AI hallucination may preclude clinicians from recommending the current version of ChatGPT as an educational tool for patients at this time.
引用
收藏
页码:371 / 377
页数:7
相关论文
共 50 条
  • [1] Evaluating Chatgpt Responses on Atrial Fibrillation for Patient Education
    Lee, Thomas J.
    Campbell, Daniel J.
    Elkattawy, Omar
    Viswanathan, Rohan
    CIRCULATION, 2023, 148
  • [2] Evaluating ChatGPT Responses on Atrial Fibrillation for Patient Education
    Lee, Thomas J.
    Campbell, Daniel J.
    Rao, Abhinav K.
    Hossain, Afif
    Elkattawy, Omar
    Radfar, Navid
    Lee, Paul
    Gardin, Julius M.
    CUREUS JOURNAL OF MEDICAL SCIENCE, 2024, 16 (06)
  • [3] Evaluating ChatGPT responses on obstructive sleep apnea for patient education
    Campbell, Daniel J.
    Estephan, Leonard E.
    Mastrolonardo, Eric V.
    Amin, Dev R.
    Huntley, Colin T.
    Boon, Maurits S.
    JOURNAL OF CLINICAL SLEEP MEDICINE, 2023, 19 (12): : 1989 - 1995
  • [4] Evaluating ChatGPT-3.5 and ChatGPT-4.0 Responses on Hyperlipidemia for Patient Education
    Lee, Thomas J.
    Rao, Abhinav K.
    Campbell, Daniel J.
    Radfar, Navid
    Dayal, Manik
    Khrais, Ayham
    CUREUS JOURNAL OF MEDICAL SCIENCE, 2024, 16 (05)
  • [5] A Novel Approach: Evaluating ChatGPT's Utility for the Management of Thyroid Nodules
    Koeroglu, Ekin Y.
    Faki, Sevguel
    Bestepe, Nagihan
    Tam, Abbas A.
    Seyrek, Neslihan cuhaci
    Topaloglu, Oya
    Ersoy, Reyhan
    Cakir, Bekir
    CUREUS JOURNAL OF MEDICAL SCIENCE, 2023, 15 (10)
  • [6] Accuracy of ChatGPT responses on tracheotomy for patient education
    Khaldi, Amina
    Machayekhi, Shahram
    Salvagno, Michele
    Maniaci, Antonino
    Vaira, Luigi A.
    La Via, Luigi
    Taccone, Fabio S.
    Lechien, Jerome R.
    EUROPEAN ARCHIVES OF OTO-RHINO-LARYNGOLOGY, 2024, : 6167 - 6172
  • [7] EVALUATING CHATGPT AS A SOURCE OF PATIENT EDUCATION MATERIALS ON DYSPAREUNIA
    Huddleson, A.
    Dick-Biascoechea, M.
    JOURNAL OF SEXUAL MEDICINE, 2024, 21
  • [8] Risk stratification of thyroid nodules and ChatGPT
    Daungsupawong, Hinpetch
    Wiwanitkit, Viroj
    AMERICAN JOURNAL OF OTOLARYNGOLOGY, 2024, 45 (03)
  • [9] Evaluating the Success of ChatGPT in Addressing Patient Questions Concerning Thyroid Surgery
    Sahin, Samil
    Tekin, Mustafa Said
    Yigit, Yesim Esen
    Erkmen, Burak
    Duymaz, Yasar Kemal
    Bahsi, Ilhan
    JOURNAL OF CRANIOFACIAL SURGERY, 2024, 35 (06) : e572 - e575
  • [10] Generative AI in education: ChatGPT-4 in evaluating students' written responses
    Jauhiainen, Jussi S.
    Garagorry Guerra, Agustin
    INNOVATIONS IN EDUCATION AND TEACHING INTERNATIONAL, 2024,