Comparative analysis of ChatGPT and Bard in answering pathology examination questions requiring image interpretation

被引:8
|
作者
Apornvirat, Sompon [1 ,2 ]
Namboonlue, Chutimon [3 ]
Laohawetwanit, Thiyaphat [1 ,2 ]
机构
[1] Thammasat Univ, Chulabhorn Int Coll Med, Div Pathol, Pathum Thani, Thailand
[2] Thammasat Univ Hosp, Div Pathol, Pathum Thani, Thailand
[3] Dr Pong Clin, Bangkok, Thailand
关键词
artificial intelligence; pathology; diagnosis;
D O I
10.1093/ajcp/aqae036
中图分类号
R36 [病理学];
学科分类号
100104 ;
摘要
Objectives To evaluate the accuracy of ChatGPT and Bard in answering pathology examination questions requiring image interpretation.Methods The study evaluated ChatGPT-4 and Bard's performance using 86 multiple-choice questions, with 17 (19.8%) focusing on general pathology and 69 (80.2%) on systemic pathology. Of these, 62 (72.1%) included microscopic images, and 57 (66.3%) were first-order questions focusing on diagnosing the disease. The authors presented these artificial intelligence (AI) tools with questions, both with and without clinical contexts, and assessed their answers against a reference standard set by pathologists.Results ChatGPT-4 achieved a 100% (n = 86) accuracy rate in questions with clinical context, surpassing Bard's 87.2% (n = 75). Without context, the accuracy of both AI tools declined significantly, with ChatGPT-4 at 52.3% (n = 45) and Bard at 38.4% (n = 33). ChatGPT-4 consistently outperformed Bard across various categories, particularly in systemic pathology and first-order questions. A notable issue identified was Bard's tendency to "hallucinate" or provide plausible but incorrect answers, especially without clinical context.Conclusions This study demonstrated the potential of ChatGPT and Bard in pathology education, stressing the importance of clinical context for accurate AI interpretations of pathology images. It underlined the need for careful AI integration in medical education.
引用
收藏
页码:252 / 260
页数:9
相关论文
共 50 条
  • [21] Outstanding performance of ChatGPT on the obstetrics and gynecology board certification examination in Japan: Document and image-based questions analysis
    Nagao, Takeshi
    Yokomizo, Ryo
    Sekizawa, Akihiko
    Okamoto, Aikou
    JOURNAL OF OBSTETRICS AND GYNAECOLOGY RESEARCH, 2024, 50 (12) : 2377 - 2378
  • [22] Assessment Study of ChatGPT-3.5's Performance on the Final Polish Medical Examination: Accuracy in Answering 980 Questions
    Siebielec, Julia
    Ordak, Michal
    Oskroba, Agata
    Dworakowska, Anna
    Bujalska-Zadrozny, Magdalena
    HEALTHCARE, 2024, 12 (16)
  • [23] Comparative Performance of ChatGPT 3.5 and GPT4 on Rhinology Standardized Board Examination Questions
    Patel, Evan A.
    Fleischer, Lindsay
    Filip, Peter
    Eggerstedt, Michael
    Hutz, Michael
    Michaelides, Elias
    Batra, Pete S.
    Tajudeen, Bobby A.
    OTO OPEN, 2024, 8 (02)
  • [24] Augmenting Medical Education: An Evaluation of GPT-4 and ChatGPT in Answering Rheumatology Questions from the Spanish Medical Licensing Examination
    Madrid Garcia, Alfredo
    Rosales, Zulema
    Freites, Dalifer
    Perez Sancristobal, Ines
    Fernandez, Benjamin
    Rodriguez Rodriguez, Luis
    ARTHRITIS & RHEUMATOLOGY, 2023, 75 : 4095 - 4097
  • [25] Exploring the pitfalls of large language models: Inconsistency and inaccuracy in answering pathology board examination-style questions
    Koga, Shunsuke
    PATHOLOGY INTERNATIONAL, 2023, 73 (12) : 618 - 620
  • [26] Letter to "Outstanding performance of ChatGPT on the obstetrics and gynecology board certification examination in Japan: Document and image-based questions analysis"
    Matsubara, Shigeki
    JOURNAL OF OBSTETRICS AND GYNAECOLOGY RESEARCH, 2025, 51 (01)
  • [27] Performance evaluation of ChatGPT-4.0 and Gemini on image-based neurosurgery board practice questions: A comparative analysis
    Mcnulty, Alana M.
    Valluri, Harshitha
    Gajjar, Avi A.
    Custozzo, Amanda
    Field, Nicholas C.
    Paul, Alexandra R.
    JOURNAL OF CLINICAL NEUROSCIENCE, 2025, 134
  • [28] Benchmarking large language models' performances for myopia care: a comparative analysis of ChatGPT-3.5, ChatGPT-4.0, and Google Bard
    Lim, Zhi Wei
    Pushpanathan, Krithi
    Yew, Samantha Min Er
    Lai, Yien
    Sun, Chen-Hsin
    Lam, Janice Sing Harn
    Chen, David Ziyou
    Goh, Jocelyn Hui Lin
    Tan, Marcus Chun Jin
    Sheng, Bin
    Cheng, Ching-Yu
    Koh, Victor Teck Chang
    Tham, Yih-Chung
    EBIOMEDICINE, 2023, 95
  • [29] Performance of artificial intelligence in bariatric surgery: comparative analysis of ChatGPT-4, Bing, and Bard in the American Society for Metabolic and Bariatric Surgery textbook of bariatric surgery questions
    Lee, Yung
    Brar, Karanbir
    Malone, Sarah
    Jin, David
    McKechnie, Tyler
    Jung, James J.
    Kroh, Matthew
    Dang, Jerry T.
    SURGERY FOR OBESITY AND RELATED DISEASES, 2024, 20 (07) : 609 - 613
  • [30] Evaluating the Performance of ChatGPT in Dermatology Specialty Certificate Examination-style Questions: A Comparative Analysis between English and Korean Language Settings
    Joh, Hae C.
    Kim, Moon-Hwan
    Ko, Joo Y.
    Kim, Joung S.
    Jue, Mihn S.
    INDIAN JOURNAL OF DERMATOLOGY, 2024, 69 (04) : 338 - 341