Usefulness of the large language model ChatGPT (GPT-4) as a diagnostic tool and information source in dermatology

被引:1
|
作者
Nielsen, Jacob P. S. [1 ,4 ]
Gronhoj, Christian [1 ]
Skov, Lone [2 ,3 ]
Gyldenlove, Mette [2 ,3 ]
机构
[1] Copenhagen Univ Hosp, Dept Otorhinolaryngol Head & Neck Surg & Audiol, Copenhagen, Denmark
[2] Copenhagen Univ Hosp Herlev & Gentofte, Dept Dermatol & Allergy, Copenhagen, Denmark
[3] Univ Copenhagen, Fac Hlth & Med Sci, Dept Clin Med, Copenhagen, Denmark
[4] Copenhagen Univ Hosp, Dept Otorhinolaryngol Head & Neck Surg & Audiol, Rigshosp, Blegdamsvej 9, DK-2100 Copenhagen, Denmark
来源
JEADV CLINICAL PRACTICE | 2024年 / 3卷 / 05期
关键词
AI; artificial intelligence; Chatbot; ChatGPT; clinical dermatology; GPT-4; information source; Large Language Model; LLM; skin disease;
D O I
10.1002/jvc2.459
中图分类号
R75 [皮肤病学与性病学];
学科分类号
100206 ;
摘要
BackgroundThe field of artificial intelligence is rapidly evolving. As an easily accessible platform with vast user engagement, the Chat Generative Pre-Trained Transformer (ChatGPT) holds great promise in medicine, with the latest version, GPT-4, capable of analyzing clinical images.ObjectivesTo evaluate ChatGPT as a diagnostic tool and information source in clinical dermatology.MethodsA total of 15 clinical images were selected from the Danish web atlas, Danderm, depicting various common and rare skin conditions. The images were uploaded to ChatGPT version GPT-4, which was prompted with 'Please provide a description, a potential diagnosis, and treatment options for the following dermatological condition'. The generated responses were assessed by senior registrars in dermatology and consultant dermatologists in terms of accuracy, relevance, and depth (scale 1-5), and in addition, the image quality was rated (scale 0-10). Demographic and professional information about the respondents was registered.ResultsA total of 23 physicians participated in the study. The majority of the respondents were consultant dermatologists (83%), and 48% had more than 10 years of training. The overall image quality had a median rating of 10 out of 10 [interquartile range (IQR): 9-10]. The overall median rating of the ChatGPT generated responses was 2 (IQR: 1-4), while overall median ratings in terms of relevance, accuracy, and depth were 2 (IQR: 1-4), 3 (IQR: 2-4) and 2 (IQR: 1-3), respectively.ConclusionsDespite the advancements in ChatGPT, including newly added image processing capabilities, the chatbot demonstrated significant limitations in providing reliable and clinically useful responses to illustrative images of various dermatological conditions.
引用
收藏
页码:1570 / 1575
页数:6
相关论文
共 50 条
  • [21] The performance of the multimodal large language model GPT-4 on the European board of radiology examination sample test
    Besler, Muhammed Said
    JAPANESE JOURNAL OF RADIOLOGY, 2024, 42 (08) : 927 - 927
  • [22] Performance of the pre-trained large language model GPT-4 on automated short answer grading
    Kortemeyer G.
    Discover Artificial Intelligence, 2024, 4 (01):
  • [23] Monitoring Patients with Glioblastoma by Using a Large Language Model: Accurate Summarization of Radiology Reports with GPT-4
    Laukamp, Kai R.
    Terzis, Robert A.
    Werner, Jan-Michael
    Galldiks, Norbert
    Lennartz, Simon
    Maintz, David
    Reimer, Robert
    Fervers, Philipp
    Gertz, Roman Johannes
    Persigehl, Thorsten
    Rubbert, Christian
    Lehnen, Nils C.
    Deuschl, Cornelius
    Schlamann, Marc
    Schoenfeld, Michael H.
    Kottlors, Jonathan
    RADIOLOGY, 2024, 312 (01)
  • [24] ChatGPT goes to the operating room: evaluating GPT-4 performance and its potential in surgical education and training in the era of large language models
    Oh, Namkee
    Choi, Gyu-Seong
    Lee, Woo Yong
    ANNALS OF SURGICAL TREATMENT AND RESEARCH, 2023, 104 (05) : 269 - 273
  • [25] Exploring the capabilities of large language models for the generation of safety cases: the case of GPT-4
    Sivakumar, Mithila
    Belle, Alvine Boaye
    Shan, Jinjun
    Shahandashti, Kimya Khakzad
    32ND INTERNATIONAL REQUIREMENTS ENGINEERING CONFERENCE WORKSHOPS, REW 2024, 2024, : 35 - 45
  • [26] REVOLUTIONIZING SYSTEMATIC LITERATURE REVIEWS: HARNESSING THE POWER OF LARGE LANGUAGE MODEL (GPT-4) FOR ENHANCED RESEARCH SYNTHESIS
    Kaur, R.
    Rai, P.
    Attri, S.
    Kaur, G.
    Singh, B.
    VALUE IN HEALTH, 2024, 27 (06) : S262 - S262
  • [27] Evaluating capabilities of large language models: Performance of GPT-4 on surgical knowledge assessments
    Beaulieu-Jones, Brendin R.
    Berrigan, Margaret T.
    Shah, Sahaj
    Marwaha, Jayson S.
    Lai, Shuo-Lun
    Brat, Gabriel A.
    SURGERY, 2024, 175 (04) : 936 - 942
  • [28] Translating radiology reports into plain language using ChatGPT and GPT-4 with prompt learning: results, limitations, and potential
    Lyu, Qing
    Tan, Josh
    Zapadka, Michael E.
    Ponnatapura, Janardhana
    Niu, Chuang
    Myers, Kyle J.
    Wang, Ge
    Whitlow, Christopher T.
    VISUAL COMPUTING FOR INDUSTRY BIOMEDICINE AND ART, 2023, 6 (01)
  • [29] Translating radiology reports into plain language using ChatGPT and GPT-4 with prompt learning: results, limitations, and potential
    Qing Lyu
    Josh Tan
    Michael E. Zapadka
    Janardhana Ponnatapura
    Chuang Niu
    Kyle J. Myers
    Ge Wang
    Christopher T. Whitlow
    Visual Computing for Industry, Biomedicine, and Art, 6
  • [30] Appropriateness of Answers to Common Preanesthesia Patient Questions Composed by the Large Language Model GPT-4 Compared to Human Authors
    Segal, Scott
    Saha, Amit K.
    Khanna, Ashish K.
    ANESTHESIOLOGY, 2024, 140 (02) : 333 - 335