Social Value Alignment in Large Language Models

被引：0

作者：

Abbol, Giulio Antonio ^{[1
]}

Marchesi, Serena ^{[2
]}

Wykowska, Agnieszka ^{[2
]}

Belpaeme, Tony ^{[1
]}

机构：

[1] Univ Ghent, Imec, IDLab AIRO, Ghent, Belgium

[2] S4HRI Ist Italiano Tecnol, Genoa, Italy

来源：

VALUE ENGINEERING IN ARTIFICIAL INTELLIGENCE, VALE 2023 | 2024年 / 14520卷

关键词：

Values; Large Language Models; LLM; Alignment; MIND;

D O I：

10.1007/978-3-031-58202-8_6

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Large Language Models (LLMs) have demonstrated remarkable proficiency in text generation and display an apparent understanding of both physical and social aspects of the world. In this study, we look into the capabilities of LLMs to generate responses that align with human values. We focus on five prominent LLMs - GPT-3, GPT-4, PaLM-2, LLaMA-2 and BLOOM - and compare their generated responses with those provided by human participants. To evaluate the value alignment of LLMs, we presented domestic scenarios to the model and elicited a response with minimal prompting instructions. Human raters judged the responses on appropriateness and value alignment. The results revealed that GPT-3, 4 and PaLM-2 performed on par with human participants, displaying a notable level of value alignment in their generated responses. However, LLaMA-2 and BLOOM fell short in this aspect, indicating a possible divergence from human values. Furthermore, our findings indicate that the raters faced difficulty in distinguishing between responses generated by LLMs and those by humans, with raters exhibiting a preference for machine-generated responses in certain cases. These findings shed light on the capabilities of state-of-the-art LLMs to align with human values, but also allow us to speculate on whether these models could be value-aware. This research contributes to the ongoing exploration of LLMs' understanding of ethical considerations and provides insights into their potential for engaging in value-driven interactions.

引用

页码：83 / 97

页数：15

共 50 条

[21] The potential of Large Language Models for social robots in special education
Voultsiou, Evdokia
Vrochidou, Eleni
Moussiades, Lefteris
Papakostas, George A.
PROGRESS IN ARTIFICIAL INTELLIGENCE, 2025,
[22] AGE-RELATED VALUE ORIENTATIONS IN LARGE LANGUAGE MODELS (LLMS)
Zhang, Xin
Ren, Yuanyi
Song, Guojie
INNOVATION IN AGING, 2024, 8 : 1010 - 1010
[23] Are Large Language Models Consistent over Value-laden Questions?
Moore, Jared
Deshpande, Tanvi
Yang, Diyi
EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Findings of EMNLP 2024, 2024, : 15185 - 15221
[24] Are Large Language Models Consistent over Value-laden Questions?
Moore, Jared
Deshpande, Tanvi
Yang, Diyi
arXiv,
[25] Expert of Experts Verification and Alignment (EVAL) Framework for Large Language Models Safety in Gastroenterology
Mauro Giuffrè
Kisung You
Ziteng Pang
Simone Kresevic
Sunny Chung
Ryan Chen
Youngmin Ko
Colleen Chan
Theo Saarinen
Milos Ajcevic
Lory S. Crocè
Guadalupe Garcia-Tsao
Ian Gralnek
Joseph J. Y. Sung
Alan Barkun
Loren Laine
Jasjeet Sekhon
Bradly Stadie
Dennis L. Shung
npj Digital Medicine, 8 (1)
[26] DistillSeq: A Framework for Safety Alignment Testing in Large Language Models using Knowledge Distillation
Yang, Mingke
Chen, Yuqi
Liu, Yi
Shi, Ling
PROCEEDINGS OF THE 33RD ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON SOFTWARE TESTING AND ANALYSIS, ISSTA 2024, 2024, : 578 - 589
[27] AutoAlign: Fully Automatic and Effective Knowledge Graph Alignment Enabled by Large Language Models
Zhang, Rui
Su, Yixin
Trisedya, Bayu Distiawan
Zhao, Xiaoyan
Yang, Min
Cheng, Hong
Qi, Jianzhong
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (06) : 2357 - 2371
[28] Improving Conversational Abilities of Quantized Large Language Models via Direct Preference Alignment
Leel, Janghwan
Park, Seongmin
Hong, Sukjin
Ki, Minsoo
Chang, Du-Seong
Choi, Jungwook
PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 11346 - 11364
[29] Large Language Models are Not Models of Natural Language: They are Corpus Models
Veres, Csaba
IEEE ACCESS, 2022, 10 : 61970 - 61979
[30] Large Language Models
Vargas, Diego Collarana
Katsamanis, Nassos
ERCIM NEWS, 2024, (136): : 12 - 13

← 1 2 3 4 5 →