Utility and Comparative Performance of Current Artificial Intelligence Large Language Models as Postoperative Medical Support Chatbots in Aesthetic Surgery

被引:6
|
作者
Abi-Rafeh, Jad [1 ]
Henry, Nader [1 ]
Xu, Hong Hao [2 ]
Bassiri-Tehrani, Brian
Arezki, Adel [3 ]
Kazan, Roy [1 ]
Gilardino, Mirko S. [1 ]
Nahai, Foad [4 ,5 ]
机构
[1] McGill Univ Hlth Ctr, Div Plast Reconstruct & Aesthet Surg, Montreal, PQ, Canada
[2] Laval Univ, Dept Med, Quebec City, PQ, Canada
[3] McGill Univ Hlth Ctr, Div Urol, Montreal, PQ, Canada
[4] Emory Univ, Div Plast & Reconstruct Surg, Sch Med, Atlanta, GA USA
[5] 875 Johnson Ferry Rd NE, Atlanta, GA 30304 USA
关键词
BREAST IMPLANT ILLNESS;
D O I
10.1093/asj/sjae025
中图分类号
R61 [外科手术学];
学科分类号
摘要
Background Large language models (LLMs) have revolutionized the way plastic surgeons and their patients can access and leverage artificial intelligence (AI). Objectives The present study aims to compare the performance of 2 current publicly available and patient-accessible LLMs in the potential application of AI as postoperative medical support chatbots in an aesthetic surgeon's practice. Methods Twenty-two simulated postoperative patient presentations following aesthetic breast plastic surgery were devised and expert-validated. Complications varied in their latency within the postoperative period, as well as urgency of required medical attention. In response to each patient-reported presentation, Open AI's ChatGPT and Google's Bard, in their unmodified and freely available versions, were objectively assessed for their comparative accuracy in generating an appropriate differential diagnosis, most-likely diagnosis, suggested medical disposition, treatments or interventions to begin from home, and/or red flag signs/symptoms indicating deterioration. Results ChatGPT cumulatively and significantly outperformed Bard across all objective assessment metrics examined (66% vs 55%, respectively; P < .05). Accuracy in generating an appropriate differential diagnosis was 61% for ChatGPT vs 57% for Bard (P = .45). ChatGPT asked an average of 9.2 questions on history vs Bard's 6.8 questions (P < .001), with accuracies of 91% vs 68% reporting the most-likely diagnosis, respectively (P < .01). Appropriate medical dispositions were suggested with accuracies of 50% by ChatGPT vs 41% by Bard (P = .40); appropriate home interventions/treatments with accuracies of 59% vs 55% (P = .94), and red flag signs/symptoms with accuracies of 79% vs 54% (P < .01), respectively. Detailed and comparative performance breakdowns according to complication latency and urgency are presented. Conclusions ChatGPT represents the superior LLM for the potential application of AI technology in postoperative medical support chatbots. Imperfect performance and limitations discussed may guide the necessary refinement to facilitate adoption.
引用
收藏
页码:889 / 896
页数:8
相关论文
共 50 条
  • [1] Large language models and artificial intelligence chatbots in vascular surgery
    Lareyre, Fabien
    Nasr, Bahaa
    Poggi, Elise
    Di Lorenzo, Gilles
    Ballaith, Ali
    Sliti, Imen
    Chaudhuri, Arindam
    Raffort, Juliette
    SEMINARS IN VASCULAR SURGERY, 2024, 7 (03) : 314 - 320
  • [2] Comparative Performance of Current Patient-Accessible Artificial Intelligence Large Language Models in the Preoperative Education of Patients in Facial Aesthetic Surgery
    Abi-Rafeh, Jad
    Bassiri-Tehrani, Brian
    Kazan, Roy
    Hanna, Steven A.
    Kanevsky, Jonathan
    Nahai, Foad
    AESTHETIC SURGERY JOURNAL OPEN FORUM, 2024, 6
  • [3] Evaluating the Performance of Artificial Intelligence Chatbots and Large Language Models in the FE and PE Structural Exams
    Naser, M. Z.
    Ross, Brandon
    Ogle, Jennifer
    Kodur, Venkatesh
    Hawileh, Rami
    Abdalla, Jamal
    Thai, Huu-Tai
    PRACTICE PERIODICAL ON STRUCTURAL DESIGN AND CONSTRUCTION, 2024, 29 (02)
  • [4] Large language models: Are artificial intelligence-based chatbots a reliable source of patient information for spinal surgery?
    Daungsupawong, Hinpetch
    Wiwanitkit, Viroj
    EUROPEAN SPINE JOURNAL, 2024, 33 (01) : 19 - 30
  • [5] Large language models: Are artificial intelligence-based chatbots a reliable source of patient information for spinal surgery?
    Stroop, Anna
    Stroop, Tabea
    Alsofy, Samer Zawy
    Nakamura, Makoto
    Moellmann, Frank
    Greiner, Christoph
    Stroop, Ralf
    EUROPEAN SPINE JOURNAL, 2024, 33 (11) : 4135 - 4143
  • [6] Artificial Intelligence in Postoperative Care: Assessing Large Language Models for Patient Recommendations in Plastic Surgery
    Gomez-Cabello, Cesar A.
    Borna, Sahar
    Pressman, Sophia M.
    Haider, Syed Ali
    Sehgal, Ajai
    Leibovich, Bradley C.
    Forte, Antonio J.
    HEALTHCARE, 2024, 12 (11)
  • [7] Practical Guide to Artificial Intelligence, Chatbots, and Large Language Models in Conducting and Reporting Research
    Loftus, Tyler J.
    Haider, Adil
    Upchurch, Gilbert R.
    JAMA SURGERY, 2025,
  • [8] Artificial intelligence chatbots and large language models in dental education: Worldwide survey of educators
    Uribe, Sergio E.
    Maldupa, Ilze
    Kavadella, Argyro
    El Tantawi, Maha
    Chaurasia, Akhilanand
    Fontana, Margherita
    Marino, Rodrigo
    Innes, Nicola
    Schwendicke, Falk
    EUROPEAN JOURNAL OF DENTAL EDUCATION, 2024, 28 (04) : 865 - 876
  • [9] Leveraging foundation and large language models in medical artificial intelligence
    Wong, Io Nam
    Monteiro, Olivia
    Baptista-Hon, Daniel T.
    Wang, Kai
    Lu, Wenyang
    Sun, Zhuo
    Nie, Sheng
    Yin, Yun
    CHINESE MEDICAL JOURNAL, 2024, 137 (21) : 2529 - 2539
  • [10] Leveraging foundation and large language models in medical artificial intelligence
    Wong Io Nam
    Monteiro Olivia
    BaptistaHon Daniel T
    Wang Kai
    Lu Wenyang
    Sun Zhuo
    Nie Sheng
    Yin Yun
    中华医学杂志英文版, 2024, 137 (21)