Capturing Failures of Large Language Models via Human Cognitive Biases

被引：0

作者：

Jones, Erik ^{[1
]}

Steinhardt, Jacob ^{[1
]}

机构：

[1] Univ Calif Berkeley, Berkeley, CA 94720 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022) | 2022年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Large language models generate complex, open-ended outputs: instead of outputting a class label they write summaries, generate dialogue, or produce working code. In order to asses the reliability of these open-ended generation systems, we aim to identify qualitative categories of erroneous behavior, beyond identifying individual errors. To hypothesize and test for such qualitative errors, we draw inspiration from human cognitive biases-systematic patterns of deviation from rational judgement. Specifically, we use cognitive biases as motivation to (i) generate hypotheses for problems that models may have, and (ii) develop experiments that elicit these problems. Using code generation as a case study, we find that OpenAI's Codex errs predictably based on how the input prompt is framed, adjusts outputs towards anchors, and is biased towards outputs that mimic frequent training examples. We then use our framework to elicit high-impact errors such as incorrectly deleting files. Our results indicate that experimental methodology from cognitive science can help characterize how machine learning systems behave.(1)

引用

页数：15

共 50 条

[41] Mitigating Biases for Instruction-following Language Models via Bias Neurons Elimination
Yang, Nakyeong
Kang, Taegwan
Choi, Jungkyu
Lee, Honglak
Jung, Kyomin
PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 9061 - 9073
[42] Capturing human categorization of natural images by combining deep networks and cognitive models
Wunderling, Nico
Willeit, Matteo
Donges, Jonathan F.
Winkelmann, Ricarda
NATURE COMMUNICATIONS, 2020, 11 (01)
[43] Capturing human categorization of natural images by combining deep networks and cognitive models
Ruairidh M. Battleday
Joshua C. Peterson
Thomas L. Griffiths
Nature Communications, 11
[44] Capturing cognitive causal paths in human reliability analysis with Bayesian network models
Zwirglmaier, Kilian
Straub, Daniel
Groth, Katrina M.
RELIABILITY ENGINEERING & SYSTEM SAFETY, 2017, 158 : 117 - 129
[45] A Generalizable Architecture for Explaining Robot Failures Using Behavior Trees and Large Language Models
Tagliamonte, Christian
Maccaline, Daniel
LeMasurier, Gregory
Yanco, Holly A.
COMPANION OF THE 2024 ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, HRI 2024 COMPANION, 2024, : 1038 - 1042
[46] Mitigating Cognitive Biases in Clinical Decision-Making Through Multi-Agent Conversations Using Large Language Models: Simulation Study
Ke, Yuhe
Yang, Rui
Lie, Sui An
Lim, Taylor Xin Yi
Ning, Yilin
Li, Irene
Abdullah, Hairil Rizal
Ting, Daniel Shu Wei
Liu, Nan
JOURNAL OF MEDICAL INTERNET RESEARCH, 2024, 26
[47] Driving and suppressing the human language network using large language models
Tuckute, Greta
Sathe, Aalok
Srikant, Shashank
Taliaferro, Maya
Wang, Mingye
Schrimpf, Martin
Kay, Kendrick
Fedorenko, Evelina
NATURE HUMAN BEHAVIOUR, 2024, 8 (03) : 544 - 561
[48] Driving and suppressing the human language network using large language models
Greta Tuckute
Aalok Sathe
Shashank Srikant
Maya Taliaferro
Mingye Wang
Martin Schrimpf
Kendrick Kay
Evelina Fedorenko
Nature Human Behaviour, 2024, 8 : 544 - 561
[49] Shadows of wisdom: Classifying meta-cognitive and morally grounded narrative content via large language models
Stavropoulos, Alexander
Crone, Damien L.
Grossmann, Igor
BEHAVIOR RESEARCH METHODS, 2024, 56 (07) : 7632 - 7646
[50] IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
You, Haoxuan
Sun, Rui
Wang, Zhecan
Chen, Long
Wang, Gengyu
Ayyubi, Hammad A.
Chang, Kai-Wei
Chang, Shih-Fu
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 11289 - 11303

← 1 2 3 4 5 →