Capturing Failures of Large Language Models via Human Cognitive Biases

被引：0

作者：

Jones, Erik ^{[1
]}

Steinhardt, Jacob ^{[1
]}

机构：

[1] Univ Calif Berkeley, Berkeley, CA 94720 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022) | 2022年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Large language models generate complex, open-ended outputs: instead of outputting a class label they write summaries, generate dialogue, or produce working code. In order to asses the reliability of these open-ended generation systems, we aim to identify qualitative categories of erroneous behavior, beyond identifying individual errors. To hypothesize and test for such qualitative errors, we draw inspiration from human cognitive biases-systematic patterns of deviation from rational judgement. Specifically, we use cognitive biases as motivation to (i) generate hypotheses for problems that models may have, and (ii) develop experiments that elicit these problems. Using code generation as a case study, we find that OpenAI's Codex errs predictably based on how the input prompt is framed, adjusts outputs towards anchors, and is biased towards outputs that mimic frequent training examples. We then use our framework to elicit high-impact errors such as incorrectly deleting files. Our results indicate that experimental methodology from cognitive science can help characterize how machine learning systems behave.(1)

引用

页数：15

共 50 条

[21] Text Classification via Large Language Models
Sun, Xiaofei
Li, Xiaoya
Li, Jiwei
Wu, Fei
Guo, Shangwei
Zhang, Tianwei
Wang, Guoyin
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 8990 - 9005
[22] Game Generation via Large Language Models
Hu, Chengpeng
Zhao, Yunlong
Liu, Jialin
2024 IEEE CONFERENCE ON GAMES, COG 2024, 2024,
[23] Harnessing Large Language Models for Cognitive Assistants in Factories
Freire, S. Kernan
Foosherian, Mina
Wang, C.
Niforatos, E.
PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON CONVERSATIONAL USER INTERFACES, CUI 2023, 2023,
[24] Large Language Models: Opportunities and Challenges For Cognitive Assessment
Efremova, Maria
Kubiak, Emeric
Baron, Simon
Bernard, David
EUROPEAN JOURNAL OF PSYCHOLOGY OPEN, 2023, 82 : 133 - 134
[25] Leveraging Cognitive Science for Testing Large Language Models
Srinivasan, Ramya
Inakoshi, Hiroya
Uchino, Kanji
2023 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE TESTING, AITEST, 2023, : 169 - 171
[26] Unveiling Selection Biases: Exploring Order and Token Sensitivity in Large Language Models
Wei, Sheng-Lun
Wu, Cheng-Kuang
Huang, Hen-Hsen
Chen, Hsin-Hsi
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 5598 - 5621
[27] Cognitive causes of 'like me' race and gender biases in human language production
Brough, Jessica
Harris, Lasana T.
Wu, Shi Hui
Branigan, Holly P.
Rabagliati, Hugh
NATURE HUMAN BEHAVIOUR, 2024, 8 (09): : 1706 - 1715
[28] Large pre-trained language models contain human-like biases of what is right and wrong to do
Patrick Schramowski
Cigdem Turan
Nico Andersen
Constantin A. Rothkopf
Kristian Kersting
Nature Machine Intelligence, 2022, 4 : 258 - 268
[29] Do Large Language Models Show Human-like Biases? Exploring Confidence-Competence Gap in AI
Singh, Aniket Kumar
Lamichhane, Bishal
Devkota, Suman
Dhakal, Uttam
Dhakal, Chandra
INFORMATION, 2024, 15 (02)
[30] Large pre-trained language models contain human-like biases of what is right and wrong to do
Schramowski, Patrick
Turan, Cigdem
Andersen, Nico
Rothkopf, Constantin A.
Kersting, Kristian
NATURE MACHINE INTELLIGENCE, 2022, 4 (03) : 258 - +

← 1 2 3 4 5 →