Capturing Failures of Large Language Models via Human Cognitive Biases

被引:0
|
作者
Jones, Erik [1 ]
Steinhardt, Jacob [1 ]
机构
[1] Univ Calif Berkeley, Berkeley, CA 94720 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large language models generate complex, open-ended outputs: instead of outputting a class label they write summaries, generate dialogue, or produce working code. In order to asses the reliability of these open-ended generation systems, we aim to identify qualitative categories of erroneous behavior, beyond identifying individual errors. To hypothesize and test for such qualitative errors, we draw inspiration from human cognitive biases-systematic patterns of deviation from rational judgement. Specifically, we use cognitive biases as motivation to (i) generate hypotheses for problems that models may have, and (ii) develop experiments that elicit these problems. Using code generation as a case study, we find that OpenAI's Codex errs predictably based on how the input prompt is framed, adjusts outputs towards anchors, and is biased towards outputs that mimic frequent training examples. We then use our framework to elicit high-impact errors such as incorrectly deleting files. Our results indicate that experimental methodology from cognitive science can help characterize how machine learning systems behave.(1)
引用
收藏
页数:15
相关论文
共 50 条
  • [31] Large Language Models Based Stemming for Information Retrieval: Promises, Pitfalls and Failures
    Wang, Shuai
    Zhuang, Shengyao
    Zuccon, Guido
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 2492 - 2496
  • [32] Personalizing with Human Cognitive Biases
    Theocharous, Georgios
    Healey, Jennifer
    Mahadevan, Sridhar
    Saad, Michele
    ADJUNCT PUBLICATION OF THE 27TH CONFERENCE ON USER MODELING, ADAPTATION AND PERSONALIZATION (ACM UMAP '19 ADJUNCT), 2019, : 13 - 17
  • [33] Customization of Closed Captions via Large Language Models
    Chavez, Mariana Arroyo
    Thompson, Bernard
    Feanny, Molly
    Alabi, Kafayat
    Kim, Minchan
    Ming, Lu
    Glasser, Abraham
    Kushalnagar, Raja
    Vogler, Christian
    COMPUTERS HELPING PEOPLE WITH SPECIAL NEEDS, PT II, ICCHP 2024, 2024, 14751 : 50 - 58
  • [34] Detoxifying Large Language Models via Knowledge Editing
    Wang, Mengru
    Zhang, Ningyu
    Xu, Ziwen
    Xi, Zekun
    Deng, Shumin
    Yao, Yunzhi
    Zhang, Qishen
    Yang, Linyi
    Wang, Jindong
    Chen, Huajun
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 3093 - 3118
  • [35] Trend Extraction and Analysis via Large Language Models
    Soru, Tommaso
    Marshall, Jim
    18TH IEEE INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING, ICSC 2024, 2024, : 285 - 288
  • [36] Learning from Failures: Translation of Natural Language Requirements into Linear Temporal Logic with Large Language Models
    Xu, Yilongfei
    Feng, Jincao
    Miao, Weikai
    2024 IEEE 24TH INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY, QRS, 2024, : 204 - 215
  • [37] A comprehensive analysis of gender, racial, and prompt-induced biases in large language models
    Torres, Nicolas
    Ulloa, Catalina
    Araya, Ignacio
    Ayala, Matias
    Jara, Sebastian
    INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2024,
  • [38] Promoting interactions between cognitive science and large language models
    Qu, Youzhi
    Du, Penghui
    Che, Wenxin
    Wei, Chen
    Zhang, Chi
    Ouyang, Wanli
    Bian, Yatao
    Xu, Feiyang
    Hu, Bin
    Du, Kai
    Wu, Haiyan
    Liu, Jia
    Liu, Quanying
    INNOVATION-THE EUROPEAN JOURNAL OF SOCIAL SCIENCE RESEARCH, 2024, 5 (02)
  • [39] Promoting interactions between cognitive science and large language models
    Qu, Youzhi
    Du, Penghui
    Che, Wenxin
    Wei, Chen
    Zhang, Chi
    Ouyang, Wanli
    Bian, Yatao
    Xu, Feiyang
    Hu, Bin
    Du, Kai
    Wu, Haiyan
    Liu, Jia
    Liu, Quanying
    Innovation, 2024, 5 (02):
  • [40] Evaluating Cognitive Maps and planning in Large Language Models with CogEval
    Momennejad, Ida
    Hasanbeig, Hosein
    Frujeri, Felipe Vieira
    Sharma, Hiteshi
    Ness, Robert Osazuwa
    Jojic, Nebojsa
    Palangi, Hamid
    Larson, Jonathan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,