Utility and Comparative Performance of Current Artificial Intelligence Large Language Models as Postoperative Medical Support Chatbots in Aesthetic Surgery

被引:6
|
作者
Abi-Rafeh, Jad [1 ]
Henry, Nader [1 ]
Xu, Hong Hao [2 ]
Bassiri-Tehrani, Brian
Arezki, Adel [3 ]
Kazan, Roy [1 ]
Gilardino, Mirko S. [1 ]
Nahai, Foad [4 ,5 ]
机构
[1] McGill Univ Hlth Ctr, Div Plast Reconstruct & Aesthet Surg, Montreal, PQ, Canada
[2] Laval Univ, Dept Med, Quebec City, PQ, Canada
[3] McGill Univ Hlth Ctr, Div Urol, Montreal, PQ, Canada
[4] Emory Univ, Div Plast & Reconstruct Surg, Sch Med, Atlanta, GA USA
[5] 875 Johnson Ferry Rd NE, Atlanta, GA 30304 USA
关键词
BREAST IMPLANT ILLNESS;
D O I
10.1093/asj/sjae025
中图分类号
R61 [外科手术学];
学科分类号
摘要
Background Large language models (LLMs) have revolutionized the way plastic surgeons and their patients can access and leverage artificial intelligence (AI). Objectives The present study aims to compare the performance of 2 current publicly available and patient-accessible LLMs in the potential application of AI as postoperative medical support chatbots in an aesthetic surgeon's practice. Methods Twenty-two simulated postoperative patient presentations following aesthetic breast plastic surgery were devised and expert-validated. Complications varied in their latency within the postoperative period, as well as urgency of required medical attention. In response to each patient-reported presentation, Open AI's ChatGPT and Google's Bard, in their unmodified and freely available versions, were objectively assessed for their comparative accuracy in generating an appropriate differential diagnosis, most-likely diagnosis, suggested medical disposition, treatments or interventions to begin from home, and/or red flag signs/symptoms indicating deterioration. Results ChatGPT cumulatively and significantly outperformed Bard across all objective assessment metrics examined (66% vs 55%, respectively; P < .05). Accuracy in generating an appropriate differential diagnosis was 61% for ChatGPT vs 57% for Bard (P = .45). ChatGPT asked an average of 9.2 questions on history vs Bard's 6.8 questions (P < .001), with accuracies of 91% vs 68% reporting the most-likely diagnosis, respectively (P < .01). Appropriate medical dispositions were suggested with accuracies of 50% by ChatGPT vs 41% by Bard (P = .40); appropriate home interventions/treatments with accuracies of 59% vs 55% (P = .94), and red flag signs/symptoms with accuracies of 79% vs 54% (P < .01), respectively. Detailed and comparative performance breakdowns according to complication latency and urgency are presented. Conclusions ChatGPT represents the superior LLM for the potential application of AI technology in postoperative medical support chatbots. Imperfect performance and limitations discussed may guide the necessary refinement to facilitate adoption.
引用
收藏
页码:889 / 896
页数:8
相关论文
共 50 条
  • [31] Using Artificial Intelligence to Generate Medical Literature for Patients: A Comparison of Three Different Large Language Models
    Pompili, D.
    Richa, Y.
    Collins, P.
    Hennessey, D. B.
    BRITISH JOURNAL OF SURGERY, 2024, 111
  • [32] Comparative Analysis of Artificial Intelligence Virtual Assistant and Large Language Models in Post-Operative Care
    Borna, Sahar
    Gomez-Cabello, Cesar A.
    Pressman, Sophia M.
    Haider, Syed Ali
    Sehgal, Ajai
    Leibovich, Bradley C.
    Cole, Dave
    Forte, Antonio Jorge
    EUROPEAN JOURNAL OF INVESTIGATION IN HEALTH PSYCHOLOGY AND EDUCATION, 2024, 14 (05) : 1413 - 1424
  • [33] Future of artificial intelligence in plastic surgery: Toward the development of specialty-specific large language models
    Ozmen, Berk B.
    Schwarz, Graham S.
    JOURNAL OF PLASTIC RECONSTRUCTIVE AND AESTHETIC SURGERY, 2024, 93 : 70 - 71
  • [34] The Comparative Performance of Large Language Models on the Hand Surgery Self-Assessment Examination
    Chen, Clark J.
    Sobol, Keenan
    Hickey, Connor
    Raphael, James
    HAND-AMERICAN ASSOCIATION FOR HAND SURGERY, 2024,
  • [35] Comparitive performance of artificial intelligence-based large language models on the orthopedic in-training examination
    Xu, Andrew Y.
    Singh, Manjot
    Balmaceno-Criss, Mariah
    Oh, Allison
    Leigh, David
    Daher, Mohammad
    Alsoof, Daniel
    Mcdonald, Christopher L.
    Diebo, Bassel G.
    Daniels, Alan H.
    JOURNAL OF ORTHOPAEDIC SURGERY, 2025, 33 (01)
  • [36] The performance of artificial intelligence chatbot large language models to address skeletal biology and bone health queries
    Cung, Michelle
    Sosa, Branden
    Yang, He S.
    McDonald, Michelle M.
    Matthews, Brya G.
    Vlug, Annegreet G.
    Imel, Erik A.
    Wein, Marc N.
    Stein, Emily Margaret
    Greenblatt, Matthew B.
    JOURNAL OF BONE AND MINERAL RESEARCH, 2024, 39 (02) : 106 - 115
  • [37] Medical Applications of Artificial Intelligence and Large Language Models: Bibliometric Analysis and Stern Call for Improved Publishing Practices
    Abi-Rafeh, Jad
    Xu, Hong Hao
    Kazan, Roy
    Furnas, Heather J.
    AESTHETIC SURGERY JOURNAL, 2023, 43 (12) : NP1098 - NP1100
  • [38] Rule-Augmented Artificial Intelligence-empowered Systems for Medical Diagnosis using Large Language Models
    Panagoulias, Dimitrios P.
    Palamidas, Filippos A.
    Virvou, Maria
    Tsihrintzis, George A.
    2023 IEEE 35TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2023, : 70 - 77
  • [39] Using artificial intelligence to generate medical literature for urology patients: a comparison of three different large language models
    Pompili, David
    Richa, Yasmina
    Collins, Patrick
    Richards, Helen
    Hennessey, Derek B.
    WORLD JOURNAL OF UROLOGY, 2024, 42 (01)
  • [40] Enhanced Artificial Intelligence in Bladder Cancer Management: A Comparative Analysis and Optimization Study of Multiple Large Language Models
    Li, Kun-peng
    Wang, Li
    Wan, Shun
    Wang, Chen-yang
    Chen, Si-yu
    Liu, Shan-hui
    Yang, Li
    JOURNAL OF ENDOUROLOGY, 2025,