Digesting Digital Health: A Study of Appropriateness and Readability of ChatGPT-Generated Gastroenterological Information

被引:1
|
作者
Toiv, Avi [1 ]
Saleh, Zachary [2 ]
Ishak, Angela [1 ]
Alsheik, Eva [2 ]
Venkat, Deepak [2 ]
Nandi, Neilanjan [3 ]
Zuchelli, Tobias E. [2 ]
机构
[1] Henry Ford Hosp, Dept Internal Med, Detroit, MI USA
[2] Henry Ford Hosp, Div Gastroenterol & Hepatol, Detroit, MI USA
[3] Univ Penn, Div Gastroenterol & Hepatol, Philadelphia, PA 19104 USA
关键词
natural language processing; AI; artificial intelligence; medical terminology; gastroenterology; EDUCATION MATERIALS; QUALITY;
D O I
10.14309/ctg.0000000000000765
中图分类号
R57 [消化系及腹部疾病];
学科分类号
摘要
INTRODUCTION:The advent of artificial intelligence-powered large language models capable of generating interactive responses to intricate queries marks a groundbreaking development in how patients access medical information. Our aim was to evaluate the appropriateness and readability of gastroenterological information generated by Chat Generative Pretrained Transformer (ChatGPT).METHODS:We analyzed responses generated by ChatGPT to 16 dialog-based queries assessing symptoms and treatments for gastrointestinal conditions and 13 definition-based queries on prevalent topics in gastroenterology. Three board-certified gastroenterologists evaluated output appropriateness with a 5-point Likert-scale proxy measurement of currency, relevance, accuracy, comprehensiveness, clarity, and urgency/next steps. Outputs with a score of 4 or 5 in all 6 categories were designated as "appropriate." Output readability was assessed with Flesch Reading Ease score, Flesch-Kinkaid Reading Level, and Simple Measure of Gobbledygook scores.RESULTS:ChatGPT responses to 44% of the 16 dialog-based and 69% of the 13 definition-based questions were deemed appropriate, and the proportion of appropriate responses within the 2 groups of questions was not significantly different (P = 0.17). Notably, none of ChatGPT's responses to questions related to gastrointestinal emergencies were designated appropriate. The mean readability scores showed that outputs were written at a college-level reading proficiency.DISCUSSION:ChatGPT can produce generally fitting responses to gastroenterological medical queries, but responses were constrained in appropriateness and readability, which limits the current utility of this large language model. Substantial development is essential before these models can be unequivocally endorsed as reliable sources of medical information.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Evaluating the quality and readability of ChatGPT-generated patient-facing medical information in rhinology
    Alexander Z. Fazilat
    Camille Brenac
    Danae Kawamoto-Duran
    Charlotte E. Berry
    Jennifer Alyono
    Michael T. Chang
    David T. Liu
    Zara M. Patel
    Stéphane Tringali
    Derrick C. Wan
    Maxime Fieux
    European Archives of Oto-Rhino-Laryngology, 2025, 282 (4) : 1911 - 1920
  • [2] Assessing Variability in the Readability of ChatGPT-Generated Education Material in Surgery
    Abdullah, Abiha
    Maze, Karleigh J.
    Brock, Bethany A.
    Smith, Burkely
    Chu, Daniel I.
    Jones, Bayley
    Wood, Lauren
    Giri, Oviya A.
    Rubyan, Michael
    Morris, Melanie
    JOURNAL OF THE AMERICAN COLLEGE OF SURGEONS, 2023, 237 (05) : S97 - S97
  • [3] Accuracy and Completeness of ChatGPT-Generated Information on Interceptive Orthodontics: A Multicenter Collaborative Study
    Hatia, Arjeta
    Doldo, Tiziana
    Parrini, Stefano
    Chisci, Elettra
    Cipriani, Linda
    Montagna, Livia
    Lagana, Giuseppina
    Guenza, Guia
    Agosta, Edoardo
    Vinjolli, Franceska
    Hoxha, Meladiona
    D'Amelio, Claudio
    Favaretto, Nicolo
    Chisci, Glauco
    JOURNAL OF CLINICAL MEDICINE, 2024, 13 (03)
  • [4] Accuracy and readability of patient-focused information on obstetrics ultrasound imaging from online sources versus ChatGPT-generated
    Piersson, A. D.
    Dzefi-Tettey, K.
    ULTRASOUND IN OBSTETRICS & GYNECOLOGY, 2023, 62 : 1 - 2
  • [5] Appropriateness and Readability of ChatGPT-4-Generated Responses for Surgical Treatment of Retinal Diseases
    Momenaei, Bita
    Wakabayashi, Taku
    Shahlaee, Abtin
    Durrani, Asad F.
    Pandit, Saagar A.
    Wang, Kristine
    Mansour, Hana A.
    Abishek, Robert M.
    Xu, David
    Sridhar, Jayanth
    Yonekawa, Yoshihiro
    Kuriyan, Ajay E.
    OPHTHALMOLOGY, 2023, 130 (11) : 1105 - 1105
  • [6] Appropriateness and Readability of ChatGPT-4-Generated Responses for Surgical Treatment of Retinal Diseases
    Momenaei, Bita
    Wakabayashi, Taku
    Shahlaee, Abtin
    Durrani, Asad F.
    Pandit, Saagar A.
    Wang, Kristine
    Mansour, Hana A.
    Abishek, Robert M.
    Xu, David
    Sridhar, Jayanth
    Yonekawa, Yoshihiro
    Kuriyan, Ajay E.
    OPHTHALMOLOGY RETINA, 2023, 7 (10): : 862 - 868
  • [7] Readability and Appropriateness of Responses Generated by ChatGPT 3.5, ChatGPT 4.0, Gemini, and Microsoft Copilot for FAQs in Refractive Surgery
    Aydin, Fahri Onur
    Aksoy, Burakhan Kursat
    Ceylan, Ali
    Akbas, Yusuf Berk
    Ermis, Serhat
    Yildiz, Burcin Kepez
    Yildirim, Yusuf
    TURK OFTALMOLOJI DERGISI-TURKISH JOURNAL OF OPHTHALMOLOGY, 2024, 54 (06): : 313 - 317
  • [8] Accuracy of ChatGPT-Generated Information on Head and Neck and Oromaxillofacial Surgery: A Multicenter Collaborative Analysis
    Vaira, Luigi Angelo
    Lechien, Jerome R.
    Abbate, Vincenzo
    Allevi, Fabiana
    Audino, Giovanni
    Beltramini, Giada Anna
    Bergonzani, Michela
    Bolzoni, Alessandro
    Committeri, Umberto
    Crimi, Salvatore
    Gabriele, Guido
    Lonardi, Fabio
    Maglitto, Fabio
    Petrocelli, Marzia
    Pucci, Resi
    Saponaro, Gianmarco
    Tel, Alessandro
    Vellone, Valentino
    Chiesa-Estomba, Carlos Miguel
    Boscolo-Rizzo, Paolo
    Salzano, Giovanni
    De Riu, Giacomo
    OTOLARYNGOLOGY-HEAD AND NECK SURGERY, 2024, 170 (06) : 1492 - 1503
  • [9] Differentiating ChatGPT-Generated and Human-Written Medical Texts: Quantitative Study
    Liao, Wenxiong
    Liu, Zhengliang
    Dai, Haixing
    Xu, Shaochen
    Wu, Zihao
    Zhang, Yiyang
    Huang, Xiaoke
    Zhu, Dajiang
    Cai, Hongmin
    Li, Quanzheng
    Liu, Tianming
    Li, Xiang
    JMIR MEDICAL EDUCATION, 2023, 9
  • [10] A Study on Distinguishing ChatGPT-Generated and Human-Written Orthopaedic Abstracts by Reviewers: Decoding the Discrepancies
    Makiev, Konstantinos G.
    Asimakidou, Maria
    Vasios, Ioannis S.
    Keskinis, Anthimos
    Petkidis, Georgios
    Tilkeridis, Konstantinos
    Ververidis, Athanasios
    Iliopoulos, Efthymios
    CUREUS JOURNAL OF MEDICAL SCIENCE, 2023, 15 (11)