Large Language Models lack essential metacognition for reliable medical reasoning

被引:1
|
作者
Griot, Maxime [1 ,2 ]
Hemptinne, Coralie [1 ,3 ]
Vanderdonckt, Jean [2 ]
Yuksel, Demet [1 ,4 ]
机构
[1] Catholic Univ Louvain, Inst Neurosci, Brussels, Belgium
[2] Catholic Univ Louvain, Louvain Res Inst Management & Org, Louvain La Neuve, Belgium
[3] Clin Univ St Luc, Ophthalmol, Brussels, Belgium
[4] Clin Univ St Luc, Med Informat Dept, Brussels, Belgium
关键词
REFLECTIVE PRACTICE; STRATEGIES;
D O I
10.1038/s41467-024-55628-6
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Large Language Models have demonstrated expert-level accuracy on medical board examinations, suggesting potential for clinical decision support systems. However, their metacognitive abilities, crucial for medical decision-making, remain largely unexplored. To address this gap, we developed MetaMedQA, a benchmark incorporating confidence scores and metacognitive tasks into multiple-choice medical questions. We evaluated twelve models on dimensions including confidence-based accuracy, missing answer recall, and unknown recall. Despite high accuracy on multiple-choice questions, our study revealed significant metacognitive deficiencies across all tested models. Models consistently failed to recognize their knowledge limitations and provided confident answers even when correct options were absent. In this work, we show that current models exhibit a critical disconnect between perceived and actual capabilities in medical reasoning, posing significant risks in clinical settings. Our findings emphasize the need for more robust evaluation frameworks that incorporate metacognitive abilities, essential for developing reliable Large Language Model enhanced clinical decision support systems.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Reasoning with large language models for medical question answering
    Lucas, Mary M.
    Yang, Justin
    Pomeroy, Jon K.
    Yang, Christopher C.
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2024, 31 (09)
  • [2] Large Language Models Are Reasoning Teachers
    Ho, Namgyu
    Schmid, Laura
    Yun, Se-Young
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 14852 - 14882
  • [3] MEDAGENTS: Large Language Models as Collaborators for Zero-shot Medical Reasoning
    Tang, Xiangru
    Zou, Anni
    Zhang, Zhuosheng
    Li, Ziming
    Zhao, Yilun
    Zhang, Xingyao
    Cohen, Arman
    Gerstein, Mark
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 599 - 621
  • [4] Reasons in the Loop: The Role of Large Language Models in Medical Co-Reasoning
    Mann, Sebastian Porsdam
    Earp, Brian D.
    Liu, Peng
    Savulescu, Julian
    AMERICAN JOURNAL OF BIOETHICS, 2024, 24 (09): : 105 - 107
  • [5] Towards Reasoning in Large Language Models: A Survey
    Huang, Jie
    Chang, Kevin Chen-Chuan
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 1049 - 1065
  • [6] Conversations on reasoning: Large language models in diagnosis
    Restrepo, Daniel
    Rodman, Adam
    Abdulnour, Raja-Elie
    JOURNAL OF HOSPITAL MEDICINE, 2024, 19 (08) : 731 - 735
  • [7] Emergent analogical reasoning in large language models
    Taylor Webb
    Keith J. Holyoak
    Hongjing Lu
    Nature Human Behaviour, 2023, 7 : 1526 - 1541
  • [8] Large Language Models are Visual Reasoning Coordinators
    Chen, Liangyu
    Li, Bo
    Shen, Sheng
    Yang, Jingkang
    Li, Chunyuan
    Keutzer, Kurt
    Darrell, Trevor
    Liu, Ziwei
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [9] Inductive reasoning in humans and large language models
    Han, Simon Jerome
    Ransom, Keith J.
    Perfors, Andrew
    Kemp, Charles
    COGNITIVE SYSTEMS RESEARCH, 2024, 83
  • [10] Conditional and Modal Reasoning in Large Language Models
    Holliday, Wesley H.
    Mandelkern, Matthew
    Zhang, Cedegao E.
    EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, 2024, : 3800 - 3821