Capturing Failures of Large Language Models via Human Cognitive Biases

被引:0
|
作者
Jones, Erik [1 ]
Steinhardt, Jacob [1 ]
机构
[1] Univ Calif Berkeley, Berkeley, CA 94720 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large language models generate complex, open-ended outputs: instead of outputting a class label they write summaries, generate dialogue, or produce working code. In order to asses the reliability of these open-ended generation systems, we aim to identify qualitative categories of erroneous behavior, beyond identifying individual errors. To hypothesize and test for such qualitative errors, we draw inspiration from human cognitive biases-systematic patterns of deviation from rational judgement. Specifically, we use cognitive biases as motivation to (i) generate hypotheses for problems that models may have, and (ii) develop experiments that elicit these problems. Using code generation as a case study, we find that OpenAI's Codex errs predictably based on how the input prompt is framed, adjusts outputs towards anchors, and is biased towards outputs that mimic frequent training examples. We then use our framework to elicit high-impact errors such as incorrectly deleting files. Our results indicate that experimental methodology from cognitive science can help characterize how machine learning systems behave.(1)
引用
收藏
页数:15
相关论文
共 50 条
  • [21] Text Classification via Large Language Models
    Sun, Xiaofei
    Li, Xiaoya
    Li, Jiwei
    Wu, Fei
    Guo, Shangwei
    Zhang, Tianwei
    Wang, Guoyin
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 8990 - 9005
  • [22] Game Generation via Large Language Models
    Hu, Chengpeng
    Zhao, Yunlong
    Liu, Jialin
    2024 IEEE CONFERENCE ON GAMES, COG 2024, 2024,
  • [23] Harnessing Large Language Models for Cognitive Assistants in Factories
    Freire, S. Kernan
    Foosherian, Mina
    Wang, C.
    Niforatos, E.
    PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON CONVERSATIONAL USER INTERFACES, CUI 2023, 2023,
  • [24] Large Language Models: Opportunities and Challenges For Cognitive Assessment
    Efremova, Maria
    Kubiak, Emeric
    Baron, Simon
    Bernard, David
    EUROPEAN JOURNAL OF PSYCHOLOGY OPEN, 2023, 82 : 133 - 134
  • [25] Leveraging Cognitive Science for Testing Large Language Models
    Srinivasan, Ramya
    Inakoshi, Hiroya
    Uchino, Kanji
    2023 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE TESTING, AITEST, 2023, : 169 - 171
  • [26] Unveiling Selection Biases: Exploring Order and Token Sensitivity in Large Language Models
    Wei, Sheng-Lun
    Wu, Cheng-Kuang
    Huang, Hen-Hsen
    Chen, Hsin-Hsi
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 5598 - 5621
  • [27] Cognitive causes of 'like me' race and gender biases in human language production
    Brough, Jessica
    Harris, Lasana T.
    Wu, Shi Hui
    Branigan, Holly P.
    Rabagliati, Hugh
    NATURE HUMAN BEHAVIOUR, 2024, 8 (09): : 1706 - 1715
  • [28] Large pre-trained language models contain human-like biases of what is right and wrong to do
    Patrick Schramowski
    Cigdem Turan
    Nico Andersen
    Constantin A. Rothkopf
    Kristian Kersting
    Nature Machine Intelligence, 2022, 4 : 258 - 268
  • [29] Do Large Language Models Show Human-like Biases? Exploring Confidence-Competence Gap in AI
    Singh, Aniket Kumar
    Lamichhane, Bishal
    Devkota, Suman
    Dhakal, Uttam
    Dhakal, Chandra
    INFORMATION, 2024, 15 (02)
  • [30] Large pre-trained language models contain human-like biases of what is right and wrong to do
    Schramowski, Patrick
    Turan, Cigdem
    Andersen, Nico
    Rothkopf, Constantin A.
    Kersting, Kristian
    NATURE MACHINE INTELLIGENCE, 2022, 4 (03) : 258 - +