Capturing Failures of Large Language Models via Human Cognitive Biases

被引：0

作者：

Jones, Erik ^{[1
]}

Steinhardt, Jacob ^{[1
]}

机构：

[1] Univ Calif Berkeley, Berkeley, CA 94720 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022) | 2022年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Large language models generate complex, open-ended outputs: instead of outputting a class label they write summaries, generate dialogue, or produce working code. In order to asses the reliability of these open-ended generation systems, we aim to identify qualitative categories of erroneous behavior, beyond identifying individual errors. To hypothesize and test for such qualitative errors, we draw inspiration from human cognitive biases-systematic patterns of deviation from rational judgement. Specifically, we use cognitive biases as motivation to (i) generate hypotheses for problems that models may have, and (ii) develop experiments that elicit these problems. Using code generation as a case study, we find that OpenAI's Codex errs predictably based on how the input prompt is framed, adjusts outputs towards anchors, and is biased towards outputs that mimic frequent training examples. We then use our framework to elicit high-impact errors such as incorrectly deleting files. Our results indicate that experimental methodology from cognitive science can help characterize how machine learning systems behave.(1)

引用

页数：15

共 50 条

[31] Large Language Models Based Stemming for Information Retrieval: Promises, Pitfalls and Failures
Wang, Shuai
Zhuang, Shengyao
Zuccon, Guido
PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 2492 - 2496
[32] Personalizing with Human Cognitive Biases
Theocharous, Georgios
Healey, Jennifer
Mahadevan, Sridhar
Saad, Michele
ADJUNCT PUBLICATION OF THE 27TH CONFERENCE ON USER MODELING, ADAPTATION AND PERSONALIZATION (ACM UMAP '19 ADJUNCT), 2019, : 13 - 17
[33] Customization of Closed Captions via Large Language Models
Chavez, Mariana Arroyo
Thompson, Bernard
Feanny, Molly
Alabi, Kafayat
Kim, Minchan
Ming, Lu
Glasser, Abraham
Kushalnagar, Raja
Vogler, Christian
COMPUTERS HELPING PEOPLE WITH SPECIAL NEEDS, PT II, ICCHP 2024, 2024, 14751 : 50 - 58
[34] Detoxifying Large Language Models via Knowledge Editing
Wang, Mengru
Zhang, Ningyu
Xu, Ziwen
Xi, Zekun
Deng, Shumin
Yao, Yunzhi
Zhang, Qishen
Yang, Linyi
Wang, Jindong
Chen, Huajun
PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 3093 - 3118
[35] Trend Extraction and Analysis via Large Language Models
Soru, Tommaso
Marshall, Jim
18TH IEEE INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING, ICSC 2024, 2024, : 285 - 288
[36] Learning from Failures: Translation of Natural Language Requirements into Linear Temporal Logic with Large Language Models
Xu, Yilongfei
Feng, Jincao
Miao, Weikai
2024 IEEE 24TH INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY, QRS, 2024, : 204 - 215
[37] A comprehensive analysis of gender, racial, and prompt-induced biases in large language models
Torres, Nicolas
Ulloa, Catalina
Araya, Ignacio
Ayala, Matias
Jara, Sebastian
INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2024,
[38] Promoting interactions between cognitive science and large language models
Qu, Youzhi
Du, Penghui
Che, Wenxin
Wei, Chen
Zhang, Chi
Ouyang, Wanli
Bian, Yatao
Xu, Feiyang
Hu, Bin
Du, Kai
Wu, Haiyan
Liu, Jia
Liu, Quanying
INNOVATION-THE EUROPEAN JOURNAL OF SOCIAL SCIENCE RESEARCH, 2024, 5 (02)
[39] Promoting interactions between cognitive science and large language models
Qu, Youzhi
Du, Penghui
Che, Wenxin
Wei, Chen
Zhang, Chi
Ouyang, Wanli
Bian, Yatao
Xu, Feiyang
Hu, Bin
Du, Kai
Wu, Haiyan
Liu, Jia
Liu, Quanying
Innovation, 2024, 5 (02):
[40] Evaluating Cognitive Maps and planning in Large Language Models with CogEval
Momennejad, Ida
Hasanbeig, Hosein
Frujeri, Felipe Vieira
Sharma, Hiteshi
Ness, Robert Osazuwa
Jojic, Nebojsa
Palangi, Hamid
Larson, Jonathan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,

← 1 2 3 4 5 →