Utility and Comparative Performance of Current Artificial Intelligence Large Language Models as Postoperative Medical Support Chatbots in Aesthetic Surgery

被引:6
|
作者
Abi-Rafeh, Jad [1 ]
Henry, Nader [1 ]
Xu, Hong Hao [2 ]
Bassiri-Tehrani, Brian
Arezki, Adel [3 ]
Kazan, Roy [1 ]
Gilardino, Mirko S. [1 ]
Nahai, Foad [4 ,5 ]
机构
[1] McGill Univ Hlth Ctr, Div Plast Reconstruct & Aesthet Surg, Montreal, PQ, Canada
[2] Laval Univ, Dept Med, Quebec City, PQ, Canada
[3] McGill Univ Hlth Ctr, Div Urol, Montreal, PQ, Canada
[4] Emory Univ, Div Plast & Reconstruct Surg, Sch Med, Atlanta, GA USA
[5] 875 Johnson Ferry Rd NE, Atlanta, GA 30304 USA
关键词
BREAST IMPLANT ILLNESS;
D O I
10.1093/asj/sjae025
中图分类号
R61 [外科手术学];
学科分类号
摘要
Background Large language models (LLMs) have revolutionized the way plastic surgeons and their patients can access and leverage artificial intelligence (AI). Objectives The present study aims to compare the performance of 2 current publicly available and patient-accessible LLMs in the potential application of AI as postoperative medical support chatbots in an aesthetic surgeon's practice. Methods Twenty-two simulated postoperative patient presentations following aesthetic breast plastic surgery were devised and expert-validated. Complications varied in their latency within the postoperative period, as well as urgency of required medical attention. In response to each patient-reported presentation, Open AI's ChatGPT and Google's Bard, in their unmodified and freely available versions, were objectively assessed for their comparative accuracy in generating an appropriate differential diagnosis, most-likely diagnosis, suggested medical disposition, treatments or interventions to begin from home, and/or red flag signs/symptoms indicating deterioration. Results ChatGPT cumulatively and significantly outperformed Bard across all objective assessment metrics examined (66% vs 55%, respectively; P < .05). Accuracy in generating an appropriate differential diagnosis was 61% for ChatGPT vs 57% for Bard (P = .45). ChatGPT asked an average of 9.2 questions on history vs Bard's 6.8 questions (P < .001), with accuracies of 91% vs 68% reporting the most-likely diagnosis, respectively (P < .01). Appropriate medical dispositions were suggested with accuracies of 50% by ChatGPT vs 41% by Bard (P = .40); appropriate home interventions/treatments with accuracies of 59% vs 55% (P = .94), and red flag signs/symptoms with accuracies of 79% vs 54% (P < .01), respectively. Detailed and comparative performance breakdowns according to complication latency and urgency are presented. Conclusions ChatGPT represents the superior LLM for the potential application of AI technology in postoperative medical support chatbots. Imperfect performance and limitations discussed may guide the necessary refinement to facilitate adoption.
引用
收藏
页码:889 / 896
页数:8
相关论文
共 50 条
  • [21] Updated Primer on Generative Artificial Intelligence and Large Language Models in Medical Imaging for Medical Professionals
    Kim, Kiduk
    Cho, Kyungjin
    Jang, Ryoungwoo
    Kyung, Sunggu
    Lee, Soyoung
    Ham, Sungwon
    Choi, Edward
    Hong, Gil-Sun
    Kim, Namkug
    KOREAN JOURNAL OF RADIOLOGY, 2024, 25 (03) : 224 - 242
  • [22] Artificial Intelligence in Academic Translation: A Comparative Study of Large Language Models and Google Translate
    Mohsen, Mohammed Ali
    PSYCHOLINGUISTICS, 2024, 35 (02): : 134 - 156
  • [23] Performance Assessment of Large Language Models in Medical Consultation: Comparative Study
    Seo, Sujeong
    Kim, Kyuli
    Yang, Heyoung
    JMIR MEDICAL INFORMATICS, 2025, 13
  • [24] Large Language Models and Artificial Intelligence in Psychiatry Medical Education: Augmenting But Not Replacing Best Practices
    Torous, John
    Greenberg, William
    ACADEMIC PSYCHIATRY, 2025, 49 (01) : 22 - 24
  • [25] Artificial Intelligence and Medical Education, Academic Writing, and Journal Policies: A Focus on Large Language Models
    Morreale, Mary K.
    Balon, Richard
    Beresin, Eugene V.
    Seritan, Andreea
    Castillo, Enrico G.
    Thomas, Lia A.
    Louie, Alan K.
    Aggarwal, Rashi
    Guerrero, Anthony P. S.
    Coverdale, John
    Brenner, Adam M.
    ACADEMIC PSYCHIATRY, 2025, 49 (01) : 5 - 9
  • [26] Assessing the Informational Value of Large Language Models Responses in Aesthetic Surgery: A Comparative Analysis with Expert Opinions
    Grippaudo, Francesca Romana
    Jeri, Matteo
    Pezzella, Michele
    Orlando, Maria Giulia
    Ribuffo, Diego
    AESTHETIC PLASTIC SURGERY, 2025,
  • [27] The performance of artificial intelligence large language model-linked chatbots in surgical decision-making for gastroesophageal reflux disease
    Huo, Bright
    Calabrese, Elisa
    Sylla, Patricia
    Kumar, Sunjay
    Ignacio, Romeo C.
    Oviedo, Rodolfo
    Hassan, Imran
    Slater, Bethany J.
    Kaiser, Andreas
    Walsh, Danielle S.
    Vosburg, Wesley
    SURGICAL ENDOSCOPY AND OTHER INTERVENTIONAL TECHNIQUES, 2024, 38 (05): : 2320 - 2330
  • [28] Making the most of Artificial Intelligence and Large Language Models to support collection development in health sciences libraries
    Portillo, Ivan
    Carson, David
    JOURNAL OF THE MEDICAL LIBRARY ASSOCIATION, 2025, 113 (01) : 92 - 93
  • [29] Performance of large language artificial intelligence models on solving restorative dentistry and endodontics student assessments
    Kuenzle, Paul
    Paris, Sebastian
    CLINICAL ORAL INVESTIGATIONS, 2024, 28 (11)
  • [30] Medical Metaverse, Part 2: Artificial Intelligence Algorithms and Large Language Models in Psychiatry and Clinical Neurosciences
    Lopez-Ojeda, Wilfredo
    Hurley, Robin A.
    JOURNAL OF NEUROPSYCHIATRY AND CLINICAL NEUROSCIENCES, 2023, 35 (04) : 316 - 320