Social Value Alignment in Large Language Models

被引：0

作者：

Abbol, Giulio Antonio ^{[1
]}

Marchesi, Serena ^{[2
]}

Wykowska, Agnieszka ^{[2
]}

Belpaeme, Tony ^{[1
]}

机构：

[1] Univ Ghent, Imec, IDLab AIRO, Ghent, Belgium

[2] S4HRI Ist Italiano Tecnol, Genoa, Italy

来源：

VALUE ENGINEERING IN ARTIFICIAL INTELLIGENCE, VALE 2023 | 2024年 / 14520卷

关键词：

Values; Large Language Models; LLM; Alignment; MIND;

D O I：

10.1007/978-3-031-58202-8_6

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Large Language Models (LLMs) have demonstrated remarkable proficiency in text generation and display an apparent understanding of both physical and social aspects of the world. In this study, we look into the capabilities of LLMs to generate responses that align with human values. We focus on five prominent LLMs - GPT-3, GPT-4, PaLM-2, LLaMA-2 and BLOOM - and compare their generated responses with those provided by human participants. To evaluate the value alignment of LLMs, we presented domestic scenarios to the model and elicited a response with minimal prompting instructions. Human raters judged the responses on appropriateness and value alignment. The results revealed that GPT-3, 4 and PaLM-2 performed on par with human participants, displaying a notable level of value alignment in their generated responses. However, LLaMA-2 and BLOOM fell short in this aspect, indicating a possible divergence from human values. Furthermore, our findings indicate that the raters faced difficulty in distinguishing between responses generated by LLMs and those by humans, with raters exhibiting a preference for machine-generated responses in certain cases. These findings shed light on the capabilities of state-of-the-art LLMs to align with human values, but also allow us to speculate on whether these models could be value-aware. This research contributes to the ongoing exploration of LLMs' understanding of ethical considerations and provides insights into their potential for engaging in value-driven interactions.

引用

页码：83 / 97

页数：15

共 50 条

[41] Images are Achilles’ Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models
Li, Yifan
Guo, Hangyu
Zhou, Kun
Zhao, Wayne Xin
Wen, Ji-Rong
arXiv,
[42] Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models
Li, Yifan
Guo, Hangyu
Zhou, Kun
Zhou, Wayne Xin
Wen, Ji-Rong
COMPUTER VISION - ECCV 2024, PT LXXIII, 2025, 15131 : 174 - 189
[43] Automated taxonomy alignment via large language models: bridging the gap between knowledge domains
Cui, Wentao
Xiao, Meng
Wang, Ludi
Wang, Xuezhi
Du, Yi
Zhou, Yuanchun
SCIENTOMETRICS, 2024, 129 (09) : 5287 - 5312
[44] Denoising Alignment with Large Language Model for Recommendation
Peng, Yingtao
Gao, Chen
Zhang, Yu
Dan, Tangpeng
Du, Xiaoyi
Luo, Hengliang
Li, Yong
Meng, Xiaofeng
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2025, 43 (02)
[45] ValueBench: Towards Comprehensively Evaluating Value Orientations and Understanding of Large Language Models
Ren, Yuanyi
Ye, Haoran
Fang, Hanjun
Zhang, Xin
Song, Guojie
PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 2015 - 2040
[46] "Turning right"? An experimental study on the political value shift in large language models
Liu, Yifei
Yuang, Panwang
Gu, Chao
HUMANITIES & SOCIAL SCIENCES COMMUNICATIONS, 2025, 12 (01):
[47] Large Language Models in der WissenschaftLarge language models in science
Karl-Friedrich Kowalewski
Severin Rodler
Die Urologie, 2024, 63 (9) : 860 - 866
[48] Unpacking the Ethical Value Alignment in Big Models
Yi X.
Xie X.
Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2023, 60 (09): : 1926 - 1945
[49] Large language models can infer psychological dispositions of social media users
Peters, Heinrich
Matz, Sandra C.
PNAS NEXUS, 2024, 3 (06):
[50] Large language models to identify social determinants of health in electronic health records
Marco Guevara
Shan Chen
Spencer Thomas
Tafadzwa L. Chaunzwa
Idalid Franco
Benjamin H. Kann
Shalini Moningi
Jack M. Qian
Madeleine Goldstein
Susan Harper
Hugo J. W. L. Aerts
Paul J. Catalano
Guergana K. Savova
Raymond H. Mak
Danielle S. Bitterman
npj Digital Medicine, 7

← 1 2 3 4 5 →