Large Language Models lack essential metacognition for reliable medical reasoning

被引:1
|
作者
Griot, Maxime [1 ,2 ]
Hemptinne, Coralie [1 ,3 ]
Vanderdonckt, Jean [2 ]
Yuksel, Demet [1 ,4 ]
机构
[1] Catholic Univ Louvain, Inst Neurosci, Brussels, Belgium
[2] Catholic Univ Louvain, Louvain Res Inst Management & Org, Louvain La Neuve, Belgium
[3] Clin Univ St Luc, Ophthalmol, Brussels, Belgium
[4] Clin Univ St Luc, Med Informat Dept, Brussels, Belgium
关键词
REFLECTIVE PRACTICE; STRATEGIES;
D O I
10.1038/s41467-024-55628-6
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Large Language Models have demonstrated expert-level accuracy on medical board examinations, suggesting potential for clinical decision support systems. However, their metacognitive abilities, crucial for medical decision-making, remain largely unexplored. To address this gap, we developed MetaMedQA, a benchmark incorporating confidence scores and metacognitive tasks into multiple-choice medical questions. We evaluated twelve models on dimensions including confidence-based accuracy, missing answer recall, and unknown recall. Despite high accuracy on multiple-choice questions, our study revealed significant metacognitive deficiencies across all tested models. Models consistently failed to recognize their knowledge limitations and provided confident answers even when correct options were absent. In this work, we show that current models exhibit a critical disconnect between perceived and actual capabilities in medical reasoning, posing significant risks in clinical settings. Our findings emphasize the need for more robust evaluation frameworks that incorporate metacognitive abilities, essential for developing reliable Large Language Model enhanced clinical decision support systems.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models
    Zhou, Gengze
    Hong, Yicong
    Wu, Qi
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 7641 - 7649
  • [32] IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
    You, Haoxuan
    Sun, Rui
    Wang, Zhecan
    Chen, Long
    Wang, Gengyu
    Ayyubi, Hammad A.
    Chang, Kai-Wei
    Chang, Shih-Fu
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 11289 - 11303
  • [33] Towards Analysis and Interpretation of Large Language Models for Arithmetic Reasoning
    Akter, Mst Shapna
    Shahriar, Hossain
    Cuzzocrea, Alfredo
    2024 11TH IEEE SWISS CONFERENCE ON DATA SCIENCE, SDS 2024, 2024, : 267 - 270
  • [34] On Implementing Case-Based Reasoning with Large Language Models
    Wilkerson, Kaitlynne
    Leake, David
    CASE-BASED REASONING RESEARCH AND DEVELOPMENT, ICCBR 2024, 2024, 14775 : 404 - 417
  • [35] Reasoning with Large Language Models on Graph Tasks: The Influence of Temperature
    Wang, Yiming
    Zhang, Ziyang
    Chen, Hanwei
    Shen, Huayi
    2024 5TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND APPLICATION, ICCEA 2024, 2024, : 630 - 634
  • [36] Over-Reasoning and Redundant Calculation of Large Language Models
    Chiang, Cheng-Han
    Lee, Hung-yi
    PROCEEDINGS OF THE 18TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2: SHORT PAPERS, 2024, : 161 - 169
  • [37] Exploring Reversal Mathematical Reasoning Ability for Large Language Models
    Guo, Pei
    You, Wangjie
    Li, Juntao
    Yan, Bowen
    Zhang, Min
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 13671 - 13685
  • [38] Reliable Natural Language Understanding with Large Language Models and Answer Set Programming
    Rajasekharan, Abhiramon
    Zeng, Yankai
    Padalkar, Parth
    Gupta, Gopal
    Electronic Proceedings in Theoretical Computer Science, EPTCS, 2023, 385 : 274 - 287
  • [39] Reliable Natural Language Understanding with Large Language Models and Answer Set Programming
    Rajasekharan, Abhiramon
    Zeng, Yankai
    Padalkar, Parth
    Gupta, Gopal
    ELECTRONIC PROCEEDINGS IN THEORETICAL COMPUTER SCIENCE, 2023, (385): : 274 - 287
  • [40] Large Language Models and the Degradation of the Medical Record
    McCoy, Liam G.
    Manrai, Arjun K.
    Rodman, Adam
    NEW ENGLAND JOURNAL OF MEDICINE, 2024, 391 (17): : 1561 - 1564