Capturing Failures of Large Language Models via Human Cognitive Biases

被引:0
|
作者
Jones, Erik [1 ]
Steinhardt, Jacob [1 ]
机构
[1] Univ Calif Berkeley, Berkeley, CA 94720 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large language models generate complex, open-ended outputs: instead of outputting a class label they write summaries, generate dialogue, or produce working code. In order to asses the reliability of these open-ended generation systems, we aim to identify qualitative categories of erroneous behavior, beyond identifying individual errors. To hypothesize and test for such qualitative errors, we draw inspiration from human cognitive biases-systematic patterns of deviation from rational judgement. Specifically, we use cognitive biases as motivation to (i) generate hypotheses for problems that models may have, and (ii) develop experiments that elicit these problems. Using code generation as a case study, we find that OpenAI's Codex errs predictably based on how the input prompt is framed, adjusts outputs towards anchors, and is biased towards outputs that mimic frequent training examples. We then use our framework to elicit high-impact errors such as incorrectly deleting files. Our results indicate that experimental methodology from cognitive science can help characterize how machine learning systems behave.(1)
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Benchmarking Cognitive Biases in Large Language Models as Evaluators
    Koo, Ryan
    Lee, Minhwa
    Raheja, Vipul
    Park, Jongin
    Kim, Zae Myung
    Kang, Dongyeop
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 517 - 545
  • [2] (Ir)rationality and cognitive biases in large language models
    Macmillan-Scott, Olivia
    Musolesi, Mirco
    ROYAL SOCIETY OPEN SCIENCE, 2024, 11 (06):
  • [3] Evaluation and mitigation of cognitive biases in medical language models
    Schmidgall, Samuel
    Harris, Carl
    Essien, Ime
    Olshvang, Daniel
    Rahman, Tawsifur
    Kim, Ji Woong
    Ziaei, Rojin
    Eshraghian, Jason
    Abadir, Peter
    Chellappa, Rama
    NPJ DIGITAL MEDICINE, 2024, 7 (01):
  • [4] Biases in Large Language Models: Origins, Inventory, and Discussion
    Navigli, Roberto
    Conia, Simone
    Ross, Bjorn
    ACM JOURNAL OF DATA AND INFORMATION QUALITY, 2023, 15 (02):
  • [5] Evaluating Large Language Models with NeuBAROCO: Syllogistic Reasoning Ability and Human-like Biases
    Ando, Risako
    Morishita, Takanobu
    Abe, Hirohiko
    Mineshima, Koji
    Okada, Mitsuhiro
    arXiv, 2023,
  • [6] Performance and biases of Large Language Models in public opinion simulation
    Qu, Yao
    Wang, Jue
    HUMANITIES & SOCIAL SCIENCES COMMUNICATIONS, 2024, 11 (01):
  • [7] Confirmation and Specificity Biases in Large Language Models: An Explorative Study
    O'Leary, Daniel E.
    IEEE INTELLIGENT SYSTEMS, 2025, 40 (01) : 63 - 68
  • [8] Large language models show human- like content biases in transmission chain experiments
    Acerbi, Alberto
    Stubbersfield, Joseph M.
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2023, 120 (44)
  • [9] Detecting implicit biases of large language models with Bayesian hypothesis testingDetecting Implicit Biases of Large Language Models...S. Si et al.
    Shijing Si
    Xiaoming Jiang
    Qinliang Su
    Lawrence Carin
    Scientific Reports, 15 (1)
  • [10] Symbol ungrounding: what the successes (and failures) of large language models reveal about human cognition
    Dove, Guy
    PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES, 2024, 379 (1911)