Response to correspondence regarding "Analysis of large-language model versus human performance for genetics questions"

被引:1
|
作者
Duong, Dat [1 ]
Solomon, Benjamin D. [1 ]
机构
[1] Natl Human Genome Res Inst, Med Genom Unit, Med Genet Branch, Bethesda, MD 20894 USA
基金
美国国家卫生研究院;
关键词
D O I
10.1038/s41431-023-01444-3
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Large-language models like ChatGPT have recently received a great deal of attention. One area of interest pertains to how these models could be used in biomedical contexts, including related to human genetics. To assess one facet of this, we compared the performance of ChatGPT versus human respondents (13,642 human responses) in answering 85 multiple-choice questions about aspects of human genetics. Overall, ChatGPT did not perform significantly differently (p = 0.8327) than human respondents; ChatGPT was 68.2% accurate, compared to 66.6% accuracy for human respondents. Both ChatGPT and humans performed better on memorization-type questions versus critical thinking questions (p < 0.0001). When asked the same question multiple times, ChatGPT frequently provided different answers (16% of initial responses), including for both initially correct and incorrect answers, and gave plausible explanations for both correct and incorrect answers. ChatGPT's performance was impressive, but currently demonstrates significant shortcomings for clinical or other high-stakes use. Addressing these limitations will be important to guide adoption in real-life situations. © 2023. This is a U.S. Government work and not under copyright protection in the US; foreign copyright protection may apply.
引用
收藏
页码:379 / 380
页数:2
相关论文
共 44 条
  • [21] Evaluating Accuracy and Readability of Responses to Midlife Health Questions: A Comparative Analysis of Six Large Language Model Chatbots
    Mondal, Himel
    Tiu, Devendra Nath
    Mondal, Shaikat
    Dutta, Rajib
    Naskar, Avijit
    Podder, Indrashis
    JOURNAL OF MID-LIFE HEALTH, 2025, 16 (01) : 45 - 50
  • [22] Response to "Letter Regarding 'Exploring the Role of a Large Language Model on Carpal Tunnel Syndrome Management: An Observation Study of ChatGPT'"
    Seth, Ishith
    Gracias, Dylan
    Rozen, Warren M.
    JOURNAL OF HAND SURGERY-AMERICAN VOLUME, 2024, 49 (04): : e3 - e4
  • [23] The Accuracy of the Multimodal Large Language Model GPT-4 on Sample Questions From the Interventional Radiology Board Examination Response
    Ariyaratne, Sisith
    Jenko, Nathan
    Davies, A. Mark
    Iyengar, Karthikeyan P.
    Botchu, Rajesh
    ACADEMIC RADIOLOGY, 2024, 31 (08) : 3477 - 3477
  • [24] Comprehensive prediction and analysis of human protein essentiality based on a pretrained large language model
    Kang, Boming
    Fan, Rui
    Cui, Chunmei
    Cui, Qinghua
    NATURE COMPUTATIONAL SCIENCE, 2024, : 196 - 206
  • [25] STEM exam performance: Open- versus closed-book methods in the large language model era
    Mizori, Rasi
    Sadiq, Muhayman
    Ahmad, Malik Takreem
    Siu, Anthony
    Ahmad, Reubeen Rashid
    Yang, Zijing
    Oram, Helen
    Galloway, James
    CLINICAL TEACHER, 2025, 22 (01):
  • [26] Appropriateness of Answers to Common Preanesthesia Patient Questions Composed by the Large Language Model GPT-4 Compared to Human Authors
    Segal, Scott
    Saha, Amit K.
    Khanna, Ashish K.
    ANESTHESIOLOGY, 2024, 140 (02) : 333 - 335
  • [27] Comparative Analysis of the Response Accuracies of Large Language Models in the Korean National Dental Hygienist Examination Across Korean and English Questions
    Song, Eun Sun
    Lee, Seung-Pyo
    INTERNATIONAL JOURNAL OF DENTAL HYGIENE, 2024,
  • [28] Doctor Versus Artificial Intelligence: Patient and Physician Evaluation of Large Language Model Responses to Rheumatology Patient Questions in a Cross-Sectional Study
    Ye, Carrie
    Zweck, Elric
    Ma, Zechen
    Smith, Justin
    Katz, Steven
    ARTHRITIS & RHEUMATOLOGY, 2024, 76 (03) : 479 - 484
  • [29] Doctor versus artificial intelligence: patient and physician evaluation of large language model responses to rheumatology patient questions: comment on the article by Ye et al
    Wang, Gang
    Zhuo, Ning
    Liu, Zhichun
    ARTHRITIS & RHEUMATOLOGY, 2024, 76 (06) : 984 - 984
  • [30] Response to the Letter to the Editor Regarding "Dual Versus Single Attending Surgeon Performance of Spinal Deformity Surgery? A Meta-Analysis"
    Daher, Mohammad
    Kreichati, Gaby
    Kharrat, Khalil
    Maroun, Ralph
    Aoun, Marven
    Chalhoub, Ralph
    Daniels, Alan H.
    Sebaaly, Amer
    WORLD NEUROSURGERY, 2024, 189 : 564 - 564