Large Language Models lack essential metacognition for reliable medical reasoning

被引：1

作者：

Griot, Maxime ^{[1
,2
]}

Hemptinne, Coralie ^{[1
,3
]}

Vanderdonckt, Jean ^{[2
]}

Yuksel, Demet ^{[1
,4
]}

机构：

[1] Catholic Univ Louvain, Inst Neurosci, Brussels, Belgium

[2] Catholic Univ Louvain, Louvain Res Inst Management & Org, Louvain La Neuve, Belgium

[3] Clin Univ St Luc, Ophthalmol, Brussels, Belgium

[4] Clin Univ St Luc, Med Informat Dept, Brussels, Belgium

来源：

NATURE COMMUNICATIONS | 2025年 / 16卷 / 01期

关键词：

REFLECTIVE PRACTICE; STRATEGIES;

D O I：

10.1038/s41467-024-55628-6

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Large Language Models have demonstrated expert-level accuracy on medical board examinations, suggesting potential for clinical decision support systems. However, their metacognitive abilities, crucial for medical decision-making, remain largely unexplored. To address this gap, we developed MetaMedQA, a benchmark incorporating confidence scores and metacognitive tasks into multiple-choice medical questions. We evaluated twelve models on dimensions including confidence-based accuracy, missing answer recall, and unknown recall. Despite high accuracy on multiple-choice questions, our study revealed significant metacognitive deficiencies across all tested models. Models consistently failed to recognize their knowledge limitations and provided confident answers even when correct options were absent. In this work, we show that current models exhibit a critical disconnect between perceived and actual capabilities in medical reasoning, posing significant risks in clinical settings. Our findings emphasize the need for more robust evaluation frameworks that incorporate metacognitive abilities, essential for developing reliable Large Language Model enhanced clinical decision support systems.

引用

页数：10

共 50 条

[31] NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models
Zhou, Gengze
Hong, Yicong
Wu, Qi
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 7641 - 7649
[32] IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
You, Haoxuan
Sun, Rui
Wang, Zhecan
Chen, Long
Wang, Gengyu
Ayyubi, Hammad A.
Chang, Kai-Wei
Chang, Shih-Fu
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 11289 - 11303
[33] Towards Analysis and Interpretation of Large Language Models for Arithmetic Reasoning
Akter, Mst Shapna
Shahriar, Hossain
Cuzzocrea, Alfredo
2024 11TH IEEE SWISS CONFERENCE ON DATA SCIENCE, SDS 2024, 2024, : 267 - 270
[34] On Implementing Case-Based Reasoning with Large Language Models
Wilkerson, Kaitlynne
Leake, David
CASE-BASED REASONING RESEARCH AND DEVELOPMENT, ICCBR 2024, 2024, 14775 : 404 - 417
[35] Reasoning with Large Language Models on Graph Tasks: The Influence of Temperature
Wang, Yiming
Zhang, Ziyang
Chen, Hanwei
Shen, Huayi
2024 5TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND APPLICATION, ICCEA 2024, 2024, : 630 - 634
[36] Over-Reasoning and Redundant Calculation of Large Language Models
Chiang, Cheng-Han
Lee, Hung-yi
PROCEEDINGS OF THE 18TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2: SHORT PAPERS, 2024, : 161 - 169
[37] Exploring Reversal Mathematical Reasoning Ability for Large Language Models
Guo, Pei
You, Wangjie
Li, Juntao
Yan, Bowen
Zhang, Min
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 13671 - 13685
[38] Reliable Natural Language Understanding with Large Language Models and Answer Set Programming
Rajasekharan, Abhiramon
Zeng, Yankai
Padalkar, Parth
Gupta, Gopal
Electronic Proceedings in Theoretical Computer Science, EPTCS, 2023, 385 : 274 - 287
[39] Reliable Natural Language Understanding with Large Language Models and Answer Set Programming
Rajasekharan, Abhiramon
Zeng, Yankai
Padalkar, Parth
Gupta, Gopal
ELECTRONIC PROCEEDINGS IN THEORETICAL COMPUTER SCIENCE, 2023, (385): : 274 - 287
[40] Large Language Models and the Degradation of the Medical Record
McCoy, Liam G.
Manrai, Arjun K.
Rodman, Adam
NEW ENGLAND JOURNAL OF MEDICINE, 2024, 391 (17): : 1561 - 1564

← 1 2 3 4 5 →