CMoralEval: A Moral Evaluation Benchmark for Chinese Large Language Models

被引：0

作者：

Yu, Linhao ^{[1
]}

Leng, Yongqi ^{[1
]}

Huang, Yufei ^{[1
]}

Wu, Shang ^{[2
]}

Liu, Haixin ^{[3
]}

Ji, Xinmeng ^{[3
]}

Zhao, Jiahui ^{[1
]}

Song, Jinwang ^{[3
]}

Cui, Tingting ^{[3
]}

Cheng, Xiaoqing ^{[3
]}

Liu, Tao ^{[3
]}

Xiong, Deyi ^{[1
]}

机构：

[1] Tianjin Univ, Coll Intelligence & Comp, Tianjin, Peoples R China

[2] Kuming Univ Sci & Technol, Fac Informat Engn & Automat, Kunming, Yunnan, Peoples R China

[3] Zhengzhou Univ, Sch Comp & Artificial Intelligence, Zhengzhou, Henan, Peoples R China

来源：

FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024 | 2024年

关键词：

AI;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

What a large language model (LLM) would respond in ethically relevant context? In this paper, we curate a large benchmark CMoralEval for morality evaluation of Chinese LLMs. The data sources of CMoralEval are two-fold: 1) a Chinese TV program discussing Chinese moral norms with stories from the society and 2) a collection of Chinese moral anomies from various newspapers and academic papers on morality. With these sources, we aim to create a moral evaluation dataset characterized by diversity and authenticity. We develop a morality taxonomy and a set of fundamental moral principles that are not only rooted in traditional Chinese culture but also consistent with contemporary societal norms. To facilitate efficient construction and annotation of instances in CMoralEval, we establish a platform with AI-assisted instance generation to streamline the annotation process. These help us curate CMoralEval that encompasses both explicit moral scenarios (14,964 instances) and moral dilemma scenarios (15,424 instances), each with instances from different data sources. We conduct extensive experiments with CMoralEval to examine a variety of Chinese LLMs. Experiment results demonstrate that CMoralEval is a challenging benchmark for Chinese LLMs. The dataset is publicly available at https://github.com/ tjunlp-lab/CMoralEval.

引用

页码：11817 / 11837

页数：21

共 50 条

[41] LEGALBENCH: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models
Guha, Neel
Nyarko, Julian
Ho, Daniel E.
Re, Christopher
Chilton, Adam
Narayana, Aditya
Chohlas-Wood, Alex
Peters, Austin
Waldon, Brandon
Rockmore, Daniel N.
Zambrano, Diego
Talisman, Dmitry
Hoque, Enam
Surani, Faiz
Fagan, Frank
Sarfaty, Galit
Dickinson, Gregory M.
Porat, Haggai
Hegland, Jason
Wu, Jessica
Nudell, Joe
Niklaus, Joel
Nay, John
Choi, Jonathan H.
Tobia, Kevin
Hagan, Margaret
Ma, Megan
Livermore, Michael
Rasumov-Rahe, Nikon
Holzenberger, Nils
Kolt, Noam
Henderson, Peter
Rehaag, Sean
Goel, Sharad
Gao, Shang
Williams, Spencer
Gandhi, Sunny
Zur, Tom
Iyer, Varun
Li, Zehua
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[42] NewsBench: A Systematic Evaluation Framework for Assessing Editorial Capabilities of Large Language Models in Chinese Journalism
Li, Miao
Chen, Ming-Bin
Tang, Bo
Hou, Shengbin
Wang, Pengyu
Deng, Haiying
Lie, Zhiyu
Xiong, Feiyu
Mao, Keming
Cheng, Peng
Luo, Yi
PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 9993 - 10014
[43] Large language models and rheumatology: a comparative evaluation
Venerito, Vincenzo
Puttaswamy, Darshan
Iannone, Florenzo
Gupta, Latika
LANCET RHEUMATOLOGY, 2023, 5 (10): : E574 - E578
[44] Automatic Evaluation of Attribution by Large Language Models
Yue, Xiang
Wang, Boshi
Chen, Ziru
Zhang, Kai
Su, Yu
Sun, Huan
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 4615 - 4635
[45] Possibilities and challenges in the moral growth of large language models: a philosophical perspective
Wang, Guoyu
Wang, Wei
Cao, Yiqin
Teng, Yan
Guo, Qianyu
Wang, Haofen
Lin, Junyu
Ma, Jiajie
Liu, Jin
Wang, Yingchun
ETHICS AND INFORMATION TECHNOLOGY, 2025, 27 (01)
[46] Development and evaluation of a large language model of ophthalmology in Chinese
Zheng, Ce
Ye, Hongfei
Guo, Jinming
Yang, Junrui
Fei, Ping
Yuan, Yuanzhi
Huang, Danqing
Huang, Yuqiang
Peng, Jie
Xie, Xiaoling
Xie, Meng
Zhao, Peiquan
Chen, Li
Zhang, Mingzhi
BRITISH JOURNAL OF OPHTHALMOLOGY, 2024,
[47] Research and Exploration on Chinese Natural Language Processing in Era of Large Language Models
大模型时代下的汉语自然语言处理研究与探索
Xi, Xuefeng (xfxi@mail.usts.edu.cn), 2025, 61 (01) : 80 - 97
[48] SciEval: A Multi -Level Large Language Model Evaluation Benchmark for Scientific Research
Sun, Liangtai
Han, Yang
Zhao, Zihan
Ma, Da
Shen, Zhennan
Chen, Baocai
Chen, Lu
Yu, Kai
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 19053 - 19061
[49] Leveraging Large Language Models for Automated Chinese Essay Scoring
Feng, Haiyue
Du, Sixuan
Zhu, Gaoxia
Zou, Yan
Poh Boon Phua
Feng, Yuhong
Zhong, Haoming
Shen, Zhiqi
Liu, Siyuan
ARTIFICIAL INTELLIGENCE IN EDUCATION, PT I, AIED 2024, 2024, 14829 : 454 - 467
[50] Large language models in traditional Chinese medicine: a systematic review
Chen, Zhe
Wang, Hui
Li, Chengxian
Liu, Chunxiang
Yang, Fengwen
Zhang, Dong
Fauci, Alice Josephine
Zhang, Junhua
ACUPUNCTURE AND HERBAL MEDICINE, 2025, 5 (01) : 57 - 67

← 1 2 3 4 5 →