Evaluation of the quality and quantity of artificial intelligence-generated responses about anesthesia and surgery: using ChatGPT 3.5 and 4.0

被引:1
|
作者
Choi, Jisun [1 ]
Oh, Ah Ran [1 ]
Park, Jungchan [1 ]
Kang, Ryung A. [1 ]
Yoo, Seung Yeon [1 ]
Lee, Dong Jae [1 ]
Yang, Kwangmo [2 ]
机构
[1] Sungkyunkwan Univ, Sch Med, Samsung Med Ctr, Dept Anesthesiol & Pain Med, Seoul, South Korea
[2] Sungkyunkwan Univ, Ctr Hlth Promot, Samsung Med Ctr, Sch Med, Seoul, South Korea
关键词
ChatGPT; artificial intelligence; quality; quantity; AI chatbot;
D O I
10.3389/fmed.2024.1400153
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Introduction The large-scale artificial intelligence (AI) language model chatbot, Chat Generative Pre-Trained Transformer (ChatGPT), is renowned for its ability to provide data quickly and efficiently. This study aimed to assess the medical responses of ChatGPT regarding anesthetic procedures.Methods Two anesthesiologist authors selected 30 questions representing inquiries patients might have about surgery and anesthesia. These questions were inputted into two versions of ChatGPT in English. A total of 31 anesthesiologists then evaluated each response for quality, quantity, and overall assessment, using 5-point Likert scales. Descriptive statistics summarized the scores, and a paired sample t-test compared ChatGPT 3.5 and 4.0.Results Regarding quality, "appropriate" was the most common rating for both ChatGPT 3.5 and 4.0 (40 and 48%, respectively). For quantity, responses were deemed "insufficient" in 59% of cases for 3.5, and "adequate" in 69% for 4.0. In overall assessment, 3 points were most common for 3.5 (36%), while 4 points were predominant for 4.0 (42%). Mean quality scores were 3.40 and 3.73, and mean quantity scores were - 0.31 (between insufficient and adequate) and 0.03 (between adequate and excessive), respectively. The mean overall score was 3.21 for 3.5 and 3.67 for 4.0. Responses from 4.0 showed statistically significant improvement in three areas.Conclusion ChatGPT generated responses mostly ranging from appropriate to slightly insufficient, providing an overall average amount of information. Version 4.0 outperformed 3.5, and further research is warranted to investigate the potential utility of AI chatbots in assisting patients with medical information.
引用
收藏
页数:9
相关论文
共 14 条
  • [1] Evaluation of Artificial Intelligence-generated Responses to Common Plastic Surgery Questions
    Daungsupawong, Hinpetch
    Wiwanitkit, Virus
    PLASTIC AND RECONSTRUCTIVE SURGERY-GLOBAL OPEN, 2023, 11 (11)
  • [2] Evaluation of Artificial Intelligence-generated Responses to Common Plastic Surgery Questions
    Copeland-Halperin, Libby R.
    O'Brien, Lauren
    Copeland, Michelle
    PLASTIC AND RECONSTRUCTIVE SURGERY-GLOBAL OPEN, 2023, 11 (08) : E5226
  • [3] Assessing the accuracy and reproducibility of artificial intelligence-generated medical responses by ChatGPT on Scheuermann's kyphosis
    Giray, Esra
    Illeez, Ozge Gulsum
    Korkmaz, Merve Damla
    Capan, Nalan
    Saygi, Evrim Karadag
    Aydin, Resa
    TURKISH JOURNAL OF PHYSICAL MEDICINE AND REHABILITATION, 2024,
  • [4] Readability and Appropriateness of Responses Generated by ChatGPT 3.5, ChatGPT 4.0, Gemini, and Microsoft Copilot for FAQs in Refractive Surgery
    Aydin, Fahri Onur
    Aksoy, Burakhan Kursat
    Ceylan, Ali
    Akbas, Yusuf Berk
    Ermis, Serhat
    Yildiz, Burcin Kepez
    Yildirim, Yusuf
    TURK OFTALMOLOJI DERGISI-TURKISH JOURNAL OF OPHTHALMOLOGY, 2024, 54 (06): : 313 - 317
  • [5] Assessing the Accuracy, Completeness, and Reliability of Artificial Intelligence-Generated Responses in Dentistry: A Pilot Study Evaluating the ChatGPT Model
    Molena, Kelly F.
    Macedo, Ana P.
    Ijaz, Anum
    Carvalho, Fabricio K.
    Gallo, Maria Julia D.
    Silva, Francisco Wanderley Garcia de Paula e
    de Rossi, Andiara
    Mezzomo, Luis A.
    Mugayar, Leda Regina F.
    Queiroz, Alexandra M.
    CUREUS JOURNAL OF MEDICAL SCIENCE, 2024, 16 (07)
  • [6] Dr. Google to Dr. ChatGPT: assessing the content and quality of artificial intelligence-generated medical information on appendicitis
    Ghanem, Yazid K.
    Rouhi, Armaun D.
    Al-Houssan, Ammr
    Saleh, Zena
    Moccia, Matthew C.
    Joshi, Hansa
    Dumon, Kristoffel R.
    Hong, Young
    Spitz, Francis
    Joshi, Amit R.
    Kwiatt, Michael
    SURGICAL ENDOSCOPY AND OTHER INTERVENTIONAL TECHNIQUES, 2024, 38 (05): : 2887 - 2893
  • [7] AUA Guideline Committee Members Determine Quality of Artificial Intelligence-Generated Responses for Female Stress Urinary Incontinence
    Chen, Annie
    Jacob, Jerril
    Hwang, Kuemin
    Kobashi, Kathleen
    Gonzalez, Ricardo R.
    UROLOGY PRACTICE, 2024, 11 (04) : 693 - 698
  • [8] AUA Guideline Committee Members Determine Quality of Artificial Intelligence-Generated Responses for Female Stress Urinary Incontinence Editorial Commentary
    Lemack, Gary E.
    UROLOGY PRACTICE, 2024, 11 (04)
  • [9] Evaluation High-Quality of Information from ChatGPT (Artificial IntelligencedLarge Language Model) Artificial Intelligence on Shoulder Stabilization Surgery
    Hurley, Eoghan T.
    Crook, Bryan S.
    Lorentz, Samuel G.
    Danilkowicz, Richard M.
    Lau, Brian C.
    Taylor, Dean C.
    Dickens, Jonathan F.
    Anakwenze, Oke
    Klifto, Christopher S.
    ARTHROSCOPY-THE JOURNAL OF ARTHROSCOPIC AND RELATED SURGERY, 2024, 40 (03): : 726 - 731.e6
  • [10] A Pilot Survey of Patient Perspectives on an Artificial Intelligence-Generated Presenter in a Patient Information Video about Face-Down Positioning after Vitreoretinal Surgery
    Macri, Carmelo Zak
    Bacchi, Stephen
    Wong, Wilson
    Baranage, Duleepa
    Sivagurunathan, Premala Devi
    Chan, Weng Onn
    OPHTHALMIC RESEARCH, 2024, 67 (01) : 567 - 572