Evaluation of the quality and quantity of artificial intelligence-generated responses about anesthesia and surgery: using ChatGPT 3.5 and 4.0

被引：1

作者：

Choi, Jisun ^{[1
]}

Oh, Ah Ran ^{[1
]}

Park, Jungchan ^{[1
]}

Kang, Ryung A. ^{[1
]}

Yoo, Seung Yeon ^{[1
]}

Lee, Dong Jae ^{[1
]}

Yang, Kwangmo ^{[2
]}

机构：

[1] Sungkyunkwan Univ, Sch Med, Samsung Med Ctr, Dept Anesthesiol & Pain Med, Seoul, South Korea

[2] Sungkyunkwan Univ, Ctr Hlth Promot, Samsung Med Ctr, Sch Med, Seoul, South Korea

来源：

FRONTIERS IN MEDICINE | 2024年 / 11卷

关键词：

ChatGPT; artificial intelligence; quality; quantity; AI chatbot;

D O I：

10.3389/fmed.2024.1400153

中图分类号：

R5 [内科学];

学科分类号：

1002 ; 100201 ;

摘要：

Introduction The large-scale artificial intelligence (AI) language model chatbot, Chat Generative Pre-Trained Transformer (ChatGPT), is renowned for its ability to provide data quickly and efficiently. This study aimed to assess the medical responses of ChatGPT regarding anesthetic procedures.Methods Two anesthesiologist authors selected 30 questions representing inquiries patients might have about surgery and anesthesia. These questions were inputted into two versions of ChatGPT in English. A total of 31 anesthesiologists then evaluated each response for quality, quantity, and overall assessment, using 5-point Likert scales. Descriptive statistics summarized the scores, and a paired sample t-test compared ChatGPT 3.5 and 4.0.Results Regarding quality, "appropriate" was the most common rating for both ChatGPT 3.5 and 4.0 (40 and 48%, respectively). For quantity, responses were deemed "insufficient" in 59% of cases for 3.5, and "adequate" in 69% for 4.0. In overall assessment, 3 points were most common for 3.5 (36%), while 4 points were predominant for 4.0 (42%). Mean quality scores were 3.40 and 3.73, and mean quantity scores were - 0.31 (between insufficient and adequate) and 0.03 (between adequate and excessive), respectively. The mean overall score was 3.21 for 3.5 and 3.67 for 4.0. Responses from 4.0 showed statistically significant improvement in three areas.Conclusion ChatGPT generated responses mostly ranging from appropriate to slightly insufficient, providing an overall average amount of information. Version 4.0 outperformed 3.5, and further research is warranted to investigate the potential utility of AI chatbots in assisting patients with medical information.

引用

页数：9

共 14 条

[1] Evaluation of Artificial Intelligence-generated Responses to Common Plastic Surgery Questions
Daungsupawong, Hinpetch
Wiwanitkit, Virus
PLASTIC AND RECONSTRUCTIVE SURGERY-GLOBAL OPEN, 2023, 11 (11)
[2] Evaluation of Artificial Intelligence-generated Responses to Common Plastic Surgery Questions
Copeland-Halperin, Libby R.
O'Brien, Lauren
Copeland, Michelle
PLASTIC AND RECONSTRUCTIVE SURGERY-GLOBAL OPEN, 2023, 11 (08) : E5226
[3] Assessing the accuracy and reproducibility of artificial intelligence-generated medical responses by ChatGPT on Scheuermann's kyphosis
Giray, Esra
Illeez, Ozge Gulsum
Korkmaz, Merve Damla
Capan, Nalan
Saygi, Evrim Karadag
Aydin, Resa
TURKISH JOURNAL OF PHYSICAL MEDICINE AND REHABILITATION, 2024,
[4] Readability and Appropriateness of Responses Generated by ChatGPT 3.5, ChatGPT 4.0, Gemini, and Microsoft Copilot for FAQs in Refractive Surgery
Aydin, Fahri Onur
Aksoy, Burakhan Kursat
Ceylan, Ali
Akbas, Yusuf Berk
Ermis, Serhat
Yildiz, Burcin Kepez
Yildirim, Yusuf
TURK OFTALMOLOJI DERGISI-TURKISH JOURNAL OF OPHTHALMOLOGY, 2024, 54 (06): : 313 - 317
[5] Assessing the Accuracy, Completeness, and Reliability of Artificial Intelligence-Generated Responses in Dentistry: A Pilot Study Evaluating the ChatGPT Model
Molena, Kelly F.
Macedo, Ana P.
Ijaz, Anum
Carvalho, Fabricio K.
Gallo, Maria Julia D.
Silva, Francisco Wanderley Garcia de Paula e
de Rossi, Andiara
Mezzomo, Luis A.
Mugayar, Leda Regina F.
Queiroz, Alexandra M.
CUREUS JOURNAL OF MEDICAL SCIENCE, 2024, 16 (07)
[6] Dr. Google to Dr. ChatGPT: assessing the content and quality of artificial intelligence-generated medical information on appendicitis
Ghanem, Yazid K.
Rouhi, Armaun D.
Al-Houssan, Ammr
Saleh, Zena
Moccia, Matthew C.
Joshi, Hansa
Dumon, Kristoffel R.
Hong, Young
Spitz, Francis
Joshi, Amit R.
Kwiatt, Michael
SURGICAL ENDOSCOPY AND OTHER INTERVENTIONAL TECHNIQUES, 2024, 38 (05): : 2887 - 2893
[7] AUA Guideline Committee Members Determine Quality of Artificial Intelligence-Generated Responses for Female Stress Urinary Incontinence
Chen, Annie
Jacob, Jerril
Hwang, Kuemin
Kobashi, Kathleen
Gonzalez, Ricardo R.
UROLOGY PRACTICE, 2024, 11 (04) : 693 - 698
[8] AUA Guideline Committee Members Determine Quality of Artificial Intelligence-Generated Responses for Female Stress Urinary Incontinence Editorial Commentary
Lemack, Gary E.
UROLOGY PRACTICE, 2024, 11 (04)
[9] Evaluation High-Quality of Information from ChatGPT (Artificial IntelligencedLarge Language Model) Artificial Intelligence on Shoulder Stabilization Surgery
Hurley, Eoghan T.
Crook, Bryan S.
Lorentz, Samuel G.
Danilkowicz, Richard M.
Lau, Brian C.
Taylor, Dean C.
Dickens, Jonathan F.
Anakwenze, Oke
Klifto, Christopher S.
ARTHROSCOPY-THE JOURNAL OF ARTHROSCOPIC AND RELATED SURGERY, 2024, 40 (03): : 726 - 731.e6
[10] A Pilot Survey of Patient Perspectives on an Artificial Intelligence-Generated Presenter in a Patient Information Video about Face-Down Positioning after Vitreoretinal Surgery
Macri, Carmelo Zak
Bacchi, Stephen
Wong, Wilson
Baranage, Duleepa
Sivagurunathan, Premala Devi
Chan, Weng Onn
OPHTHALMIC RESEARCH, 2024, 67 (01) : 567 - 572

← 1 2 →