Social Value Alignment in Large Language Models

被引:0
|
作者
Abbol, Giulio Antonio [1 ]
Marchesi, Serena [2 ]
Wykowska, Agnieszka [2 ]
Belpaeme, Tony [1 ]
机构
[1] Univ Ghent, Imec, IDLab AIRO, Ghent, Belgium
[2] S4HRI Ist Italiano Tecnol, Genoa, Italy
关键词
Values; Large Language Models; LLM; Alignment; MIND;
D O I
10.1007/978-3-031-58202-8_6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large Language Models (LLMs) have demonstrated remarkable proficiency in text generation and display an apparent understanding of both physical and social aspects of the world. In this study, we look into the capabilities of LLMs to generate responses that align with human values. We focus on five prominent LLMs - GPT-3, GPT-4, PaLM-2, LLaMA-2 and BLOOM - and compare their generated responses with those provided by human participants. To evaluate the value alignment of LLMs, we presented domestic scenarios to the model and elicited a response with minimal prompting instructions. Human raters judged the responses on appropriateness and value alignment. The results revealed that GPT-3, 4 and PaLM-2 performed on par with human participants, displaying a notable level of value alignment in their generated responses. However, LLaMA-2 and BLOOM fell short in this aspect, indicating a possible divergence from human values. Furthermore, our findings indicate that the raters faced difficulty in distinguishing between responses generated by LLMs and those by humans, with raters exhibiting a preference for machine-generated responses in certain cases. These findings shed light on the capabilities of state-of-the-art LLMs to align with human values, but also allow us to speculate on whether these models could be value-aware. This research contributes to the ongoing exploration of LLMs' understanding of ethical considerations and provides insights into their potential for engaging in value-driven interactions.
引用
收藏
页码:83 / 97
页数:15
相关论文
共 50 条
  • [1] On the Calibration of Large Language Models and Alignment
    Zhu, Chiwei
    Xu, Benfeng
    Wang, Quan
    Zhang, Yongdong
    Mao, Zhendong
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 9778 - 9795
  • [2] Fundamental Limitations of Alignment in Large Language Models
    Wolf, Yotam
    Wies, Noam
    Avnery, Oshri
    Levine, Yoav
    Shashua, Amnon
    arXiv, 2023,
  • [3] Hybrid Alignment Training for Large Language Models
    Wang, Chenglong
    Zhou, Hang
    Chang, Kaiyan
    Li, Bei
    Mu, Yongyu
    Xiao, Tong
    Liu, Tongran
    Zhu, Jingbo
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 11389 - 11403
  • [4] Investigating Cultural Alignment of Large Language Models
    AlKhamissi, Badr
    ElNokrashy, Muhammad
    AlKhamissi, Mai
    Diab, Mona
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 12404 - 12422
  • [5] Cultural bias and cultural alignment of large language models
    Tao, Yan
    Viberg, Olga
    Baker, Ryan S.
    Kizilcec, Rene F.
    PNAS NEXUS, 2024, 3 (09):
  • [6] Unlocking the Power of Large Language Models for Entity Alignment
    Jiang, Xuhui
    Shen, Yinghan
    Shi, Zhichao
    Xu, Chengjin
    Li, Wei
    Li, Zixuan
    Guo, Jian
    Shen, Huawei
    Wang, Yuanzhuo
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 7566 - 7583
  • [7] Strong and weak alignment of large language models with human values
    Khamassi, Mehdi
    Nahon, Marceau
    Chatila, Raja
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [8] A survey on multilingual large language models: corpora, alignment, and bias
    Xu, Yuemei
    Hu, Ling
    Zhao, Jiayi
    Qiu, Zihan
    Xu, Kexin
    Ye, Yuqi
    Gu, Hanwen
    FRONTIERS OF COMPUTER SCIENCE, 2025, 19 (11)
  • [9] Language as a cognitive and social tool at the time of large language models
    Borghi, Anna M.
    De Livio, Chiara
    Gervasi, Angelo Mattia
    Mannella, Francesco
    Nolfi, Stefano
    Tummolini, Luca
    JOURNAL OF CULTURAL COGNITIVE SCIENCE, 2024, 8 (03) : 179 - 198
  • [10] Alignment Studio: Aligning Large Language Models to Particular Contextual Regulations
    Achintalwar, Swapnaja
    Baldini, Ioana
    Bouneffouf, Djallel
    Byamugisha, Joan
    Chang, Maria
    Dognin, Pierre
    Farchi, Eitan
    Makondo, Ndivhuwo
    Mojsilovic, Aleksandra
    Nagireddy, Manish
    Ramamurthy, Karthikeyan Natesan
    Padhi, Inkit
    Raz, Orna
    Rios, Jesus
    Sattigeri, Prasanna
    Singh, Moninder
    Thwala, Siphiwe A.
    Uceda-Sosa, Rosario A.
    Varshney, Kush R.
    IEEE INTERNET COMPUTING, 2024, 28 (05) : 28 - 36