Response to correspondence regarding "Analysis of large-language model versus human performance for genetics questions"

被引：1

作者：

Duong, Dat ^{[1
]}

Solomon, Benjamin D. ^{[1
]}

机构：

[1] Natl Human Genome Res Inst, Med Genom Unit, Med Genet Branch, Bethesda, MD 20894 USA

来源：

EUROPEAN JOURNAL OF HUMAN GENETICS | 2024年 / 32卷 / 04期

基金：

美国国家卫生研究院;

关键词：

D O I：

10.1038/s41431-023-01444-3

中图分类号：

Q5 [生物化学]; Q7 [分子生物学];

学科分类号：

071010 ; 081704 ;

摘要：

Large-language models like ChatGPT have recently received a great deal of attention. One area of interest pertains to how these models could be used in biomedical contexts, including related to human genetics. To assess one facet of this, we compared the performance of ChatGPT versus human respondents (13,642 human responses) in answering 85 multiple-choice questions about aspects of human genetics. Overall, ChatGPT did not perform significantly differently (p = 0.8327) than human respondents; ChatGPT was 68.2% accurate, compared to 66.6% accuracy for human respondents. Both ChatGPT and humans performed better on memorization-type questions versus critical thinking questions (p < 0.0001). When asked the same question multiple times, ChatGPT frequently provided different answers (16% of initial responses), including for both initially correct and incorrect answers, and gave plausible explanations for both correct and incorrect answers. ChatGPT's performance was impressive, but currently demonstrates significant shortcomings for clinical or other high-stakes use. Addressing these limitations will be important to guide adoption in real-life situations. © 2023. This is a U.S. Government work and not under copyright protection in the US; foreign copyright protection may apply.

引用

页码：379 / 380

页数：2

共 44 条

[21] Evaluating Accuracy and Readability of Responses to Midlife Health Questions: A Comparative Analysis of Six Large Language Model Chatbots
Mondal, Himel
Tiu, Devendra Nath
Mondal, Shaikat
Dutta, Rajib
Naskar, Avijit
Podder, Indrashis
JOURNAL OF MID-LIFE HEALTH, 2025, 16 (01) : 45 - 50
[22] Response to "Letter Regarding 'Exploring the Role of a Large Language Model on Carpal Tunnel Syndrome Management: An Observation Study of ChatGPT'"
Seth, Ishith
Gracias, Dylan
Rozen, Warren M.
JOURNAL OF HAND SURGERY-AMERICAN VOLUME, 2024, 49 (04): : e3 - e4
[23] The Accuracy of the Multimodal Large Language Model GPT-4 on Sample Questions From the Interventional Radiology Board Examination Response
Ariyaratne, Sisith
Jenko, Nathan
Davies, A. Mark
Iyengar, Karthikeyan P.
Botchu, Rajesh
ACADEMIC RADIOLOGY, 2024, 31 (08) : 3477 - 3477
[24] Comprehensive prediction and analysis of human protein essentiality based on a pretrained large language model
Kang, Boming
Fan, Rui
Cui, Chunmei
Cui, Qinghua
NATURE COMPUTATIONAL SCIENCE, 2024, : 196 - 206
[25] STEM exam performance: Open- versus closed-book methods in the large language model era
Mizori, Rasi
Sadiq, Muhayman
Ahmad, Malik Takreem
Siu, Anthony
Ahmad, Reubeen Rashid
Yang, Zijing
Oram, Helen
Galloway, James
CLINICAL TEACHER, 2025, 22 (01):
[26] Appropriateness of Answers to Common Preanesthesia Patient Questions Composed by the Large Language Model GPT-4 Compared to Human Authors
Segal, Scott
Saha, Amit K.
Khanna, Ashish K.
ANESTHESIOLOGY, 2024, 140 (02) : 333 - 335
[27] Comparative Analysis of the Response Accuracies of Large Language Models in the Korean National Dental Hygienist Examination Across Korean and English Questions
Song, Eun Sun
Lee, Seung-Pyo
INTERNATIONAL JOURNAL OF DENTAL HYGIENE, 2024,
[28] Doctor Versus Artificial Intelligence: Patient and Physician Evaluation of Large Language Model Responses to Rheumatology Patient Questions in a Cross-Sectional Study
Ye, Carrie
Zweck, Elric
Ma, Zechen
Smith, Justin
Katz, Steven
ARTHRITIS & RHEUMATOLOGY, 2024, 76 (03) : 479 - 484
[29] Doctor versus artificial intelligence: patient and physician evaluation of large language model responses to rheumatology patient questions: comment on the article by Ye et al
Wang, Gang
Zhuo, Ning
Liu, Zhichun
ARTHRITIS & RHEUMATOLOGY, 2024, 76 (06) : 984 - 984
[30] Response to the Letter to the Editor Regarding "Dual Versus Single Attending Surgeon Performance of Spinal Deformity Surgery? A Meta-Analysis"
Daher, Mohammad
Kreichati, Gaby
Kharrat, Khalil
Maroun, Ralph
Aoun, Marven
Chalhoub, Ralph
Daniels, Alan H.
Sebaaly, Amer
WORLD NEUROSURGERY, 2024, 189 : 564 - 564

← 1 2 3 4 5 →