Do Large Language Models Show Decision Heuristics Similar to Humans? A Case Study Using GPT-3.5

被引:13
|
作者
Suri, Gaurav [1 ]
Slater, Lily R. [1 ]
Ziaee, Ali [1 ]
Nguyen, Morgan [1 ]
机构
[1] San Francisco State Univ, Dept Psychol, Mind Brain & Behav, 1600 Holloway Ave, San Francisco, CA 94132 USA
关键词
natural language processing; Large Language Models; ChatGPT; heuristics; PHYSICIANS; JUDGMENT; CHOICE;
D O I
10.1037/xge0001547
中图分类号
B84 [心理学];
学科分类号
04 ; 0402 ;
摘要
A Large Language Model (LLM) is an artificial intelligence system trained on vast amounts of natural language data, enabling it to generate human-like responses to written or spoken language input. Generative Pre-Trained Transformer (GPT)-3.5 is an example of an LLM that supports a conversational agent called ChatGPT. In this work, we used a series of novel prompts to determine whether ChatGPT shows heuristics and other context-sensitive responses. We also tested the same prompts on human participants. Across four studies, we found that ChatGPT was influenced by random anchors in making estimates (anchoring, Study 1); it judged the likelihood of two events occurring together to be higher than the likelihood of either event occurring alone, and it was influenced by anecdotal information (representativeness and availability heuristic, Study 2); it found an item to be more efficacious when its features were presented positively rather than negatively-even though both presentations contained statistically equivalent information (framing effect, Study 3); and it valued an owned item more than a newly found item even though the two items were objectively identical (endowment effect, Study 4). In each study, human participants showed similar effects. Heuristics and context-sensitive responses in humans are thought to be driven by cognitive and affective processes such as loss aversion and effort reduction. The fact that an LLM-which lacks these processes-also shows such responses invites consideration of the possibility that language is sufficiently rich to carry these effects and may play a role in generating these effects in humans.
引用
收藏
页码:1066 / 1075
页数:10
相关论文
共 31 条
  • [1] How do large language models answer breast cancer quiz questions? A comparative study of GPT-3.5, GPT-4 and Google Gemini
    Irmici, Giovanni
    Cozzi, Andrea
    Della Pepa, Gianmarco
    De Berardinis, Claudia
    D'Ascoli, Elisa
    Cellina, Michaela
    Ce, Maurizio
    Depretto, Catherine
    Scaperrotta, Gianfranco
    RADIOLOGIA MEDICA, 2024, 129 (10): : 1463 - 1467
  • [3] Evaluating Large Language Models for the National Premedical Exam in India: Comparative Analysis of GPT-3.5, GPT-4, and Bard
    Farhat, Faiza
    Chaudhry, Beenish Moalla
    Nadeem, Mohammad
    Sohail, Shahab Saquib
    Madsen, Dag Oivind
    JMIR MEDICAL EDUCATION, 2024, 10
  • [4] Evaluating the GPT-3.5 and GPT-4 Large Language Models for Zero-Shot Classification of South African Violent Event Data
    Kotze, Eduan
    Senekal, Burgert A.
    2024 7TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, BIG DATA, COMPUTING AND DATA COMMUNICATION SYSTEMS, ICABCD 2024, 2024,
  • [5] Large Language Models for Code Obfuscation Evaluation of the Obfuscation Capabilities of OpenAI's GPT-3.5 on C Source Code
    Kochberger, Patrick
    Gramberger, Maximilian
    Schrittwieser, Sebastian
    Lawitschka, Caroline
    Weippl, Edgar R.
    PROCEEDINGS OF THE 20TH INTERNATIONAL CONFERENCE ON SECURITY AND CRYPTOGRAPHY, SECRYPT 2023, 2023, : 7 - 19
  • [6] Large language models and bariatric surgery patient education: a comparative readability analysis of GPT-3.5, GPT-4, Bard, and online institutional resources
    Srinivasan, Nitin
    Samaan, Jamil S.
    Rajeev, Nithya D.
    Kanu, Mmerobasi U.
    Yeo, Yee Hui
    Samakar, Kamran
    SURGICAL ENDOSCOPY AND OTHER INTERVENTIONAL TECHNIQUES, 2024, 38 (05): : 2522 - 2532
  • [7] Large language models and bariatric surgery patient education: a comparative readability analysis of GPT-3.5, GPT-4, Bard, and online institutional resources
    Nitin Srinivasan
    Jamil S. Samaan
    Nithya D. Rajeev
    Mmerobasi U. Kanu
    Yee Hui Yeo
    Kamran Samakar
    Surgical Endoscopy, 2024, 38 : 2522 - 2532
  • [8] Comparative evaluation of artificial intelligence models GPT-4 and GPT-3.5 in clinical decision-making in sports surgery and physiotherapy: a cross-sectional study
    Sönmez Saglam
    Veysel Uludag
    Zekeriya Okan Karaduman
    Mehmet Arıcan
    Mücahid Osman Yücel
    Raşit Emin Dalaslan
    BMC Medical Informatics and Decision Making, 25 (1)
  • [9] Human-Comparable Sensitivity of Large Language Models inIdenti fying Eligible Studies Through Title and Abstract Screening:3-Layer Strategy Using GPT-3.5 and GPT-4 for Systematic Reviews
    Matsui, Kentaro
    Utsumi, Tomohiro
    Aoki, Yumi
    Maruki, Taku
    Takeshima, Masahiro
    Takaesu, Yoshikazu
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2024, 26
  • [10] Evaluating large language models for surgical chart review of second stage implant-based breast reconstruction: a comparative analysis of manual review, GPT-3.5 Turbo, and GPT-4 Turbo
    Lakhlani, Devi
    Dadhania, Dhruv
    Nazerali, Rahim
    EUROPEAN JOURNAL OF PLASTIC SURGERY, 2025, 48 (01)