Capturing Failures of Large Language Models via Human Cognitive Biases

被引:0
|
作者
Jones, Erik [1 ]
Steinhardt, Jacob [1 ]
机构
[1] Univ Calif Berkeley, Berkeley, CA 94720 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large language models generate complex, open-ended outputs: instead of outputting a class label they write summaries, generate dialogue, or produce working code. In order to asses the reliability of these open-ended generation systems, we aim to identify qualitative categories of erroneous behavior, beyond identifying individual errors. To hypothesize and test for such qualitative errors, we draw inspiration from human cognitive biases-systematic patterns of deviation from rational judgement. Specifically, we use cognitive biases as motivation to (i) generate hypotheses for problems that models may have, and (ii) develop experiments that elicit these problems. Using code generation as a case study, we find that OpenAI's Codex errs predictably based on how the input prompt is framed, adjusts outputs towards anchors, and is biased towards outputs that mimic frequent training examples. We then use our framework to elicit high-impact errors such as incorrectly deleting files. Our results indicate that experimental methodology from cognitive science can help characterize how machine learning systems behave.(1)
引用
收藏
页数:15
相关论文
共 50 条
  • [41] Mitigating Biases for Instruction-following Language Models via Bias Neurons Elimination
    Yang, Nakyeong
    Kang, Taegwan
    Choi, Jungkyu
    Lee, Honglak
    Jung, Kyomin
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 9061 - 9073
  • [42] Capturing human categorization of natural images by combining deep networks and cognitive models
    Wunderling, Nico
    Willeit, Matteo
    Donges, Jonathan F.
    Winkelmann, Ricarda
    NATURE COMMUNICATIONS, 2020, 11 (01)
  • [43] Capturing human categorization of natural images by combining deep networks and cognitive models
    Ruairidh M. Battleday
    Joshua C. Peterson
    Thomas L. Griffiths
    Nature Communications, 11
  • [44] Capturing cognitive causal paths in human reliability analysis with Bayesian network models
    Zwirglmaier, Kilian
    Straub, Daniel
    Groth, Katrina M.
    RELIABILITY ENGINEERING & SYSTEM SAFETY, 2017, 158 : 117 - 129
  • [45] A Generalizable Architecture for Explaining Robot Failures Using Behavior Trees and Large Language Models
    Tagliamonte, Christian
    Maccaline, Daniel
    LeMasurier, Gregory
    Yanco, Holly A.
    COMPANION OF THE 2024 ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, HRI 2024 COMPANION, 2024, : 1038 - 1042
  • [46] Mitigating Cognitive Biases in Clinical Decision-Making Through Multi-Agent Conversations Using Large Language Models: Simulation Study
    Ke, Yuhe
    Yang, Rui
    Lie, Sui An
    Lim, Taylor Xin Yi
    Ning, Yilin
    Li, Irene
    Abdullah, Hairil Rizal
    Ting, Daniel Shu Wei
    Liu, Nan
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2024, 26
  • [47] Driving and suppressing the human language network using large language models
    Tuckute, Greta
    Sathe, Aalok
    Srikant, Shashank
    Taliaferro, Maya
    Wang, Mingye
    Schrimpf, Martin
    Kay, Kendrick
    Fedorenko, Evelina
    NATURE HUMAN BEHAVIOUR, 2024, 8 (03) : 544 - 561
  • [48] Driving and suppressing the human language network using large language models
    Greta Tuckute
    Aalok Sathe
    Shashank Srikant
    Maya Taliaferro
    Mingye Wang
    Martin Schrimpf
    Kendrick Kay
    Evelina Fedorenko
    Nature Human Behaviour, 2024, 8 : 544 - 561
  • [49] Shadows of wisdom: Classifying meta-cognitive and morally grounded narrative content via large language models
    Stavropoulos, Alexander
    Crone, Damien L.
    Grossmann, Igor
    BEHAVIOR RESEARCH METHODS, 2024, 56 (07) : 7632 - 7646
  • [50] IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
    You, Haoxuan
    Sun, Rui
    Wang, Zhecan
    Chen, Long
    Wang, Gengyu
    Ayyubi, Hammad A.
    Chang, Kai-Wei
    Chang, Shih-Fu
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 11289 - 11303