Usefulness of the large language model ChatGPT (GPT-4) as a diagnostic tool and information source in dermatology

被引:1
|
作者
Nielsen, Jacob P. S. [1 ,4 ]
Gronhoj, Christian [1 ]
Skov, Lone [2 ,3 ]
Gyldenlove, Mette [2 ,3 ]
机构
[1] Copenhagen Univ Hosp, Dept Otorhinolaryngol Head & Neck Surg & Audiol, Copenhagen, Denmark
[2] Copenhagen Univ Hosp Herlev & Gentofte, Dept Dermatol & Allergy, Copenhagen, Denmark
[3] Univ Copenhagen, Fac Hlth & Med Sci, Dept Clin Med, Copenhagen, Denmark
[4] Copenhagen Univ Hosp, Dept Otorhinolaryngol Head & Neck Surg & Audiol, Rigshosp, Blegdamsvej 9, DK-2100 Copenhagen, Denmark
来源
JEADV CLINICAL PRACTICE | 2024年 / 3卷 / 05期
关键词
AI; artificial intelligence; Chatbot; ChatGPT; clinical dermatology; GPT-4; information source; Large Language Model; LLM; skin disease;
D O I
10.1002/jvc2.459
中图分类号
R75 [皮肤病学与性病学];
学科分类号
100206 ;
摘要
BackgroundThe field of artificial intelligence is rapidly evolving. As an easily accessible platform with vast user engagement, the Chat Generative Pre-Trained Transformer (ChatGPT) holds great promise in medicine, with the latest version, GPT-4, capable of analyzing clinical images.ObjectivesTo evaluate ChatGPT as a diagnostic tool and information source in clinical dermatology.MethodsA total of 15 clinical images were selected from the Danish web atlas, Danderm, depicting various common and rare skin conditions. The images were uploaded to ChatGPT version GPT-4, which was prompted with 'Please provide a description, a potential diagnosis, and treatment options for the following dermatological condition'. The generated responses were assessed by senior registrars in dermatology and consultant dermatologists in terms of accuracy, relevance, and depth (scale 1-5), and in addition, the image quality was rated (scale 0-10). Demographic and professional information about the respondents was registered.ResultsA total of 23 physicians participated in the study. The majority of the respondents were consultant dermatologists (83%), and 48% had more than 10 years of training. The overall image quality had a median rating of 10 out of 10 [interquartile range (IQR): 9-10]. The overall median rating of the ChatGPT generated responses was 2 (IQR: 1-4), while overall median ratings in terms of relevance, accuracy, and depth were 2 (IQR: 1-4), 3 (IQR: 2-4) and 2 (IQR: 1-3), respectively.ConclusionsDespite the advancements in ChatGPT, including newly added image processing capabilities, the chatbot demonstrated significant limitations in providing reliable and clinically useful responses to illustrative images of various dermatological conditions.
引用
收藏
页码:1570 / 1575
页数:6
相关论文
共 50 条
  • [31] The Accuracy of the Multimodal Large Language Model GPT-4 on Sample Questions From the Interventional Radiology Board Examination Response
    Ariyaratne, Sisith
    Jenko, Nathan
    Davies, A. Mark
    Iyengar, Karthikeyan P.
    Botchu, Rajesh
    ACADEMIC RADIOLOGY, 2024, 31 (08) : 3477 - 3477
  • [32] Evaluating Large Language Model-Assisted Emergency Triage: A Comparison of Acuity Assessments by GPT-4 and Medical Experts
    Haim, Gal Ben
    Saban, Mor
    Barash, Yiftach
    Cirulnik, David
    Shaham, Amit
    Eisenman, Ben Zion
    Burshtein, Livnat
    Mymon, Orly
    Klang, Eyal
    JOURNAL OF CLINICAL NURSING, 2024,
  • [33] Investigating the clinical reasoning abilities of large language model GPT-4: an analysis of postoperative complications from renal surgeries
    Hsueh, Jessica Y.
    Nethala, Daniel
    Singh, Shiva
    Linehan, W. Marston
    Ball, Mark W.
    UROLOGIC ONCOLOGY-SEMINARS AND ORIGINAL INVESTIGATIONS, 2024, 42 (09) : 292e1 - 292e7
  • [34] Large-Scale Validation of the Feasibility of GPT-4 as a Proofreading Tool for Head CT Reports
    Kim, Songsoo
    Kim, Donghyun
    Shin, Hyun Joo
    Lee, Seung Hyun
    Kang, Yeseul
    Jeong, Sejin
    Kim, Jaewoong
    Han, Miran
    Lee, Seong-Joon
    Kim, Joonho
    Yum, Jungyon
    Han, Changho
    Yoon, Dukyong
    RADIOLOGY, 2025, 314 (01)
  • [35] Diagnostic accuracy of a large language model in rheumatology: comparison of physician and ChatGPT-4
    Martin Krusche
    Johnna Callhoff
    Johannes Knitza
    Nikolas Ruffer
    Rheumatology International, 2024, 44 : 303 - 306
  • [36] Diagnostic accuracy of a large language model in rheumatology: comparison of physician and ChatGPT-4
    Krusche, Martin
    Callhoff, Johnna
    Knitza, Johannes
    Ruffer, Nikolas
    RHEUMATOLOGY INTERNATIONAL, 2024, 44 (02) : 303 - 306
  • [37] Evaluating Large Language Models for the National Premedical Exam in India: Comparative Analysis of GPT-3.5, GPT-4, and Bard
    Farhat, Faiza
    Chaudhry, Beenish Moalla
    Nadeem, Mohammad
    Sohail, Shahab Saquib
    Madsen, Dag Oivind
    JMIR MEDICAL EDUCATION, 2024, 10
  • [38] Artificial intelligence large language model ChatGPT: is it a trustworthy and reliable source of information for sarcoma patients?
    Valentini, Marisa
    Szkandera, Joanna
    Smolle, Maria
    Scheipl, Susanne
    Leithner, Andreas
    Andreou, Dimosthenis
    FRONTIERS IN PUBLIC HEALTH, 2024, 12
  • [39] Stratified Evaluation of Large Language Model GPT-4's Question-Answering In Surgery reveals AI Knowledge Gaps
    Lonergan, Rebecca Murphy
    Curry, Jake
    Dhas, Kallpana
    Simmons, Benno
    BRITISH JOURNAL OF SURGERY, 2024, 111
  • [40] Fine-Tuning Large Language Models for Ontology Engineering: A Comparative Analysis of GPT-4 and Mistral
    Doumanas, Dimitrios
    Soularidis, Andreas
    Spiliotopoulos, Dimitris
    Vassilakis, Costas
    Kotis, Konstantinos
    APPLIED SCIENCES-BASEL, 2025, 15 (04):