Do Large Language Models Show Human-like Biases? Exploring Confidence-Competence Gap in AI

被引:1
|
作者
Singh, Aniket Kumar [1 ]
Lamichhane, Bishal [2 ]
Devkota, Suman [3 ]
Dhakal, Uttam [3 ]
Dhakal, Chandra [4 ]
机构
[1] Youngstown State Univ, Dept Comp Sci & Informat Syst, Youngstown, OH 44555 USA
[2] Univ Nevada, Dept Math & Stat, Reno, NV 89557 USA
[3] Youngstown State Univ, Dept Elect & Comp Engn, Youngstown, OH 44555 USA
[4] Univ Georgia, Dept Agr & Appl Econ, Athens, GA 30602 USA
关键词
Large Language Models; Dunning-Kruger effects; chat-GPT; BARD; Claude; LLaMA; cognitive biases; artificial intelligence; AI ethics; Natural Language Processing; confidence assessment;
D O I
10.3390/info15020092
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This study investigates self-assessment tendencies in Large Language Models (LLMs), examining if patterns resemble human cognitive biases like the Dunning-Kruger effect. LLMs, including GPT, BARD, Claude, and LLaMA, are evaluated using confidence scores on reasoning tasks. The models provide self-assessed confidence levels before and after responding to different questions. The results show cases where high confidence does not correlate with correctness, suggesting overconfidence. Conversely, low confidence despite accurate responses indicates potential underestimation. The confidence scores vary across problem categories and difficulties, reducing confidence for complex queries. GPT-4 displays consistent confidence, while LLaMA and Claude demonstrate more variations. Some of these patterns resemble the Dunning-Kruger effect, where incompetence leads to inflated self-evaluations. While not conclusively evident, these observations parallel this phenomenon and provide a foundation to further explore the alignment of competence and confidence in LLMs. As LLMs continue to expand their societal roles, further research into their self-assessment mechanisms is warranted to fully understand their capabilities and limitations.
引用
收藏
页数:20
相关论文
共 19 条
  • [1] Exploring Human-Like Translation Strategy with Large Language Models
    He, Zhiwei
    Liang, Tian
    Jiao, Wenxiang
    Zhang, Zhuosheng
    Yang, Yujiu
    Wang, Rui
    Tu, Zhaopeng
    Shi, Shuming
    Wang, Xing
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2024, 12 : 229 - 246
  • [2] Large pre-trained language models contain human-like biases of what is right and wrong to do
    Schramowski, Patrick
    Turan, Cigdem
    Andersen, Nico
    Rothkopf, Constantin A.
    Kersting, Kristian
    NATURE MACHINE INTELLIGENCE, 2022, 4 (03) : 258 - +
  • [3] Large pre-trained language models contain human-like biases of what is right and wrong to do
    Patrick Schramowski
    Cigdem Turan
    Nico Andersen
    Constantin A. Rothkopf
    Kristian Kersting
    Nature Machine Intelligence, 2022, 4 : 258 - 268
  • [4] Evaluating Large Language Models with NeuBAROCO: Syllogistic Reasoning Ability and Human-like Biases
    Ando, Risako
    Morishita, Takanobu
    Abe, Hirohiko
    Mineshima, Koji
    Okada, Mitsuhiro
    arXiv, 2023,
  • [5] Human-like intuitive behavior and reasoning biases emerged in large language models but disappeared in ChatGPT
    Hagendorff, Thilo
    Fabi, Sarah
    Kosinski, Michal
    NATURE COMPUTATIONAL SCIENCE, 2023, 3 (10): : 833 - +
  • [6] Human-like intuitive behavior and reasoning biases emerged in large language models but disappeared in ChatGPT
    Thilo Hagendorff
    Sarah Fabi
    Michal Kosinski
    Nature Computational Science, 2023, 3 : 833 - 838
  • [7] Large language models display human-like social desirability biases in Big Five personality surveys
    Salecha, Aadesh
    Ireland, Molly E.
    Subrahmanya, Shashanka
    Sedoc, Joao
    Ungar, Lyle H.
    Eichstaedt, Johannes C.
    PNAS NEXUS, 2024, 3 (12):
  • [8] Large language models show human- like content biases in transmission chain experiments
    Acerbi, Alberto
    Stubbersfield, Joseph M.
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2023, 120 (44)
  • [9] Do self-supervised speech models develop human-like perception biases?
    Millet, Juliette
    Dunbar, Ewan
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 7591 - 7605
  • [10] Towards Human-Like Educational Question Generation with Large Language Models
    Wang, Zichao
    Valdez, Jakob
    Mallick, Debshila Basu
    Baraniuk, Richard G.
    ARTIFICIAL INTELLIGENCE IN EDUCATION, PT I, 2022, 13355 : 153 - 166