Social Value Alignment in Large Language Models

被引:0
|
作者
Abbol, Giulio Antonio [1 ]
Marchesi, Serena [2 ]
Wykowska, Agnieszka [2 ]
Belpaeme, Tony [1 ]
机构
[1] Univ Ghent, Imec, IDLab AIRO, Ghent, Belgium
[2] S4HRI Ist Italiano Tecnol, Genoa, Italy
关键词
Values; Large Language Models; LLM; Alignment; MIND;
D O I
10.1007/978-3-031-58202-8_6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large Language Models (LLMs) have demonstrated remarkable proficiency in text generation and display an apparent understanding of both physical and social aspects of the world. In this study, we look into the capabilities of LLMs to generate responses that align with human values. We focus on five prominent LLMs - GPT-3, GPT-4, PaLM-2, LLaMA-2 and BLOOM - and compare their generated responses with those provided by human participants. To evaluate the value alignment of LLMs, we presented domestic scenarios to the model and elicited a response with minimal prompting instructions. Human raters judged the responses on appropriateness and value alignment. The results revealed that GPT-3, 4 and PaLM-2 performed on par with human participants, displaying a notable level of value alignment in their generated responses. However, LLaMA-2 and BLOOM fell short in this aspect, indicating a possible divergence from human values. Furthermore, our findings indicate that the raters faced difficulty in distinguishing between responses generated by LLMs and those by humans, with raters exhibiting a preference for machine-generated responses in certain cases. These findings shed light on the capabilities of state-of-the-art LLMs to align with human values, but also allow us to speculate on whether these models could be value-aware. This research contributes to the ongoing exploration of LLMs' understanding of ethical considerations and provides insights into their potential for engaging in value-driven interactions.
引用
收藏
页码:83 / 97
页数:15
相关论文
共 50 条
  • [41] Images are Achilles’ Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models
    Li, Yifan
    Guo, Hangyu
    Zhou, Kun
    Zhao, Wayne Xin
    Wen, Ji-Rong
    arXiv,
  • [42] Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models
    Li, Yifan
    Guo, Hangyu
    Zhou, Kun
    Zhou, Wayne Xin
    Wen, Ji-Rong
    COMPUTER VISION - ECCV 2024, PT LXXIII, 2025, 15131 : 174 - 189
  • [43] Automated taxonomy alignment via large language models: bridging the gap between knowledge domains
    Cui, Wentao
    Xiao, Meng
    Wang, Ludi
    Wang, Xuezhi
    Du, Yi
    Zhou, Yuanchun
    SCIENTOMETRICS, 2024, 129 (09) : 5287 - 5312
  • [44] Denoising Alignment with Large Language Model for Recommendation
    Peng, Yingtao
    Gao, Chen
    Zhang, Yu
    Dan, Tangpeng
    Du, Xiaoyi
    Luo, Hengliang
    Li, Yong
    Meng, Xiaofeng
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2025, 43 (02)
  • [45] ValueBench: Towards Comprehensively Evaluating Value Orientations and Understanding of Large Language Models
    Ren, Yuanyi
    Ye, Haoran
    Fang, Hanjun
    Zhang, Xin
    Song, Guojie
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 2015 - 2040
  • [46] "Turning right"? An experimental study on the political value shift in large language models
    Liu, Yifei
    Yuang, Panwang
    Gu, Chao
    HUMANITIES & SOCIAL SCIENCES COMMUNICATIONS, 2025, 12 (01):
  • [47] Large Language Models in der WissenschaftLarge language models in science
    Karl-Friedrich Kowalewski
    Severin Rodler
    Die Urologie, 2024, 63 (9) : 860 - 866
  • [48] Unpacking the Ethical Value Alignment in Big Models
    Yi X.
    Xie X.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2023, 60 (09): : 1926 - 1945
  • [49] Large language models can infer psychological dispositions of social media users
    Peters, Heinrich
    Matz, Sandra C.
    PNAS NEXUS, 2024, 3 (06):
  • [50] Large language models to identify social determinants of health in electronic health records
    Marco Guevara
    Shan Chen
    Spencer Thomas
    Tafadzwa L. Chaunzwa
    Idalid Franco
    Benjamin H. Kann
    Shalini Moningi
    Jack M. Qian
    Madeleine Goldstein
    Susan Harper
    Hugo J. W. L. Aerts
    Paul J. Catalano
    Guergana K. Savova
    Raymond H. Mak
    Danielle S. Bitterman
    npj Digital Medicine, 7