Do Large Language Models Show Decision Heuristics Similar to Humans? A Case Study Using GPT-3.5

被引：13

作者：

Suri, Gaurav ^{[1
]}

Slater, Lily R. ^{[1
]}

Ziaee, Ali ^{[1
]}

Nguyen, Morgan ^{[1
]}

机构：

[1] San Francisco State Univ, Dept Psychol, Mind Brain & Behav, 1600 Holloway Ave, San Francisco, CA 94132 USA

来源：

JOURNAL OF EXPERIMENTAL PSYCHOLOGY-GENERAL | 2024年 / 153卷 / 04期

关键词：

natural language processing; Large Language Models; ChatGPT; heuristics; PHYSICIANS; JUDGMENT; CHOICE;

D O I：

10.1037/xge0001547

中图分类号：

B84 [心理学];

学科分类号：

04 ; 0402 ;

摘要：

A Large Language Model (LLM) is an artificial intelligence system trained on vast amounts of natural language data, enabling it to generate human-like responses to written or spoken language input. Generative Pre-Trained Transformer (GPT)-3.5 is an example of an LLM that supports a conversational agent called ChatGPT. In this work, we used a series of novel prompts to determine whether ChatGPT shows heuristics and other context-sensitive responses. We also tested the same prompts on human participants. Across four studies, we found that ChatGPT was influenced by random anchors in making estimates (anchoring, Study 1); it judged the likelihood of two events occurring together to be higher than the likelihood of either event occurring alone, and it was influenced by anecdotal information (representativeness and availability heuristic, Study 2); it found an item to be more efficacious when its features were presented positively rather than negatively-even though both presentations contained statistically equivalent information (framing effect, Study 3); and it valued an owned item more than a newly found item even though the two items were objectively identical (endowment effect, Study 4). In each study, human participants showed similar effects. Heuristics and context-sensitive responses in humans are thought to be driven by cognitive and affective processes such as loss aversion and effort reduction. The fact that an LLM-which lacks these processes-also shows such responses invites consideration of the possibility that language is sufficiently rich to carry these effects and may play a role in generating these effects in humans.

引用

页码：1066 / 1075

页数：10

共 31 条

[1] How do large language models answer breast cancer quiz questions? A comparative study of GPT-3.5, GPT-4 and Google Gemini
Irmici, Giovanni
Cozzi, Andrea
Della Pepa, Gianmarco
De Berardinis, Claudia
D'Ascoli, Elisa
Cellina, Michaela
Ce, Maurizio
Depretto, Catherine
Scaperrotta, Gianfranco
RADIOLOGIA MEDICA, 2024, 129 (10): : 1463 - 1467
[2] Assessing the performance of large language models (GPT-3.5 and GPT-4) and accurate clinical information for pediatric nephrology
Sav, Nadide Melike
PEDIATRIC NEPHROLOGY, 2025,
[3] Evaluating Large Language Models for the National Premedical Exam in India: Comparative Analysis of GPT-3.5, GPT-4, and Bard
Farhat, Faiza
Chaudhry, Beenish Moalla
Nadeem, Mohammad
Sohail, Shahab Saquib
Madsen, Dag Oivind
JMIR MEDICAL EDUCATION, 2024, 10
[4] Evaluating the GPT-3.5 and GPT-4 Large Language Models for Zero-Shot Classification of South African Violent Event Data
Kotze, Eduan
Senekal, Burgert A.
2024 7TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, BIG DATA, COMPUTING AND DATA COMMUNICATION SYSTEMS, ICABCD 2024, 2024,
[5] Large Language Models for Code Obfuscation Evaluation of the Obfuscation Capabilities of OpenAI's GPT-3.5 on C Source Code
Kochberger, Patrick
Gramberger, Maximilian
Schrittwieser, Sebastian
Lawitschka, Caroline
Weippl, Edgar R.
PROCEEDINGS OF THE 20TH INTERNATIONAL CONFERENCE ON SECURITY AND CRYPTOGRAPHY, SECRYPT 2023, 2023, : 7 - 19
[6] Large language models and bariatric surgery patient education: a comparative readability analysis of GPT-3.5, GPT-4, Bard, and online institutional resources
Srinivasan, Nitin
Samaan, Jamil S.
Rajeev, Nithya D.
Kanu, Mmerobasi U.
Yeo, Yee Hui
Samakar, Kamran
SURGICAL ENDOSCOPY AND OTHER INTERVENTIONAL TECHNIQUES, 2024, 38 (05): : 2522 - 2532
[7] Large language models and bariatric surgery patient education: a comparative readability analysis of GPT-3.5, GPT-4, Bard, and online institutional resources
Nitin Srinivasan
Jamil S. Samaan
Nithya D. Rajeev
Mmerobasi U. Kanu
Yee Hui Yeo
Kamran Samakar
Surgical Endoscopy, 2024, 38 : 2522 - 2532
[8] Comparative evaluation of artificial intelligence models GPT-4 and GPT-3.5 in clinical decision-making in sports surgery and physiotherapy: a cross-sectional study
Sönmez Saglam
Veysel Uludag
Zekeriya Okan Karaduman
Mehmet Arıcan
Mücahid Osman Yücel
Raşit Emin Dalaslan
BMC Medical Informatics and Decision Making, 25 (1)
[9] Human-Comparable Sensitivity of Large Language Models inIdenti fying Eligible Studies Through Title and Abstract Screening:3-Layer Strategy Using GPT-3.5 and GPT-4 for Systematic Reviews
Matsui, Kentaro
Utsumi, Tomohiro
Aoki, Yumi
Maruki, Taku
Takeshima, Masahiro
Takaesu, Yoshikazu
JOURNAL OF MEDICAL INTERNET RESEARCH, 2024, 26
[10] Evaluating large language models for surgical chart review of second stage implant-based breast reconstruction: a comparative analysis of manual review, GPT-3.5 Turbo, and GPT-4 Turbo
Lakhlani, Devi
Dadhania, Dhruv
Nazerali, Rahim
EUROPEAN JOURNAL OF PLASTIC SURGERY, 2025, 48 (01)

← 1 2 3 4 →