CMoralEval: A Moral Evaluation Benchmark for Chinese Large Language Models

被引:0
|
作者
Yu, Linhao [1 ]
Leng, Yongqi [1 ]
Huang, Yufei [1 ]
Wu, Shang [2 ]
Liu, Haixin [3 ]
Ji, Xinmeng [3 ]
Zhao, Jiahui [1 ]
Song, Jinwang [3 ]
Cui, Tingting [3 ]
Cheng, Xiaoqing [3 ]
Liu, Tao [3 ]
Xiong, Deyi [1 ]
机构
[1] Tianjin Univ, Coll Intelligence & Comp, Tianjin, Peoples R China
[2] Kuming Univ Sci & Technol, Fac Informat Engn & Automat, Kunming, Yunnan, Peoples R China
[3] Zhengzhou Univ, Sch Comp & Artificial Intelligence, Zhengzhou, Henan, Peoples R China
关键词
AI;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
What a large language model (LLM) would respond in ethically relevant context? In this paper, we curate a large benchmark CMoralEval for morality evaluation of Chinese LLMs. The data sources of CMoralEval are two-fold: 1) a Chinese TV program discussing Chinese moral norms with stories from the society and 2) a collection of Chinese moral anomies from various newspapers and academic papers on morality. With these sources, we aim to create a moral evaluation dataset characterized by diversity and authenticity. We develop a morality taxonomy and a set of fundamental moral principles that are not only rooted in traditional Chinese culture but also consistent with contemporary societal norms. To facilitate efficient construction and annotation of instances in CMoralEval, we establish a platform with AI-assisted instance generation to streamline the annotation process. These help us curate CMoralEval that encompasses both explicit moral scenarios (14,964 instances) and moral dilemma scenarios (15,424 instances), each with instances from different data sources. We conduct extensive experiments with CMoralEval to examine a variety of Chinese LLMs. Experiment results demonstrate that CMoralEval is a challenging benchmark for Chinese LLMs. The dataset is publicly available at https://github.com/ tjunlp-lab/CMoralEval.
引用
收藏
页码:11817 / 11837
页数:21
相关论文
共 50 条
  • [41] LEGALBENCH: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models
    Guha, Neel
    Nyarko, Julian
    Ho, Daniel E.
    Re, Christopher
    Chilton, Adam
    Narayana, Aditya
    Chohlas-Wood, Alex
    Peters, Austin
    Waldon, Brandon
    Rockmore, Daniel N.
    Zambrano, Diego
    Talisman, Dmitry
    Hoque, Enam
    Surani, Faiz
    Fagan, Frank
    Sarfaty, Galit
    Dickinson, Gregory M.
    Porat, Haggai
    Hegland, Jason
    Wu, Jessica
    Nudell, Joe
    Niklaus, Joel
    Nay, John
    Choi, Jonathan H.
    Tobia, Kevin
    Hagan, Margaret
    Ma, Megan
    Livermore, Michael
    Rasumov-Rahe, Nikon
    Holzenberger, Nils
    Kolt, Noam
    Henderson, Peter
    Rehaag, Sean
    Goel, Sharad
    Gao, Shang
    Williams, Spencer
    Gandhi, Sunny
    Zur, Tom
    Iyer, Varun
    Li, Zehua
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [42] NewsBench: A Systematic Evaluation Framework for Assessing Editorial Capabilities of Large Language Models in Chinese Journalism
    Li, Miao
    Chen, Ming-Bin
    Tang, Bo
    Hou, Shengbin
    Wang, Pengyu
    Deng, Haiying
    Lie, Zhiyu
    Xiong, Feiyu
    Mao, Keming
    Cheng, Peng
    Luo, Yi
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 9993 - 10014
  • [43] Large language models and rheumatology: a comparative evaluation
    Venerito, Vincenzo
    Puttaswamy, Darshan
    Iannone, Florenzo
    Gupta, Latika
    LANCET RHEUMATOLOGY, 2023, 5 (10): : E574 - E578
  • [44] Automatic Evaluation of Attribution by Large Language Models
    Yue, Xiang
    Wang, Boshi
    Chen, Ziru
    Zhang, Kai
    Su, Yu
    Sun, Huan
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 4615 - 4635
  • [45] Possibilities and challenges in the moral growth of large language models: a philosophical perspective
    Wang, Guoyu
    Wang, Wei
    Cao, Yiqin
    Teng, Yan
    Guo, Qianyu
    Wang, Haofen
    Lin, Junyu
    Ma, Jiajie
    Liu, Jin
    Wang, Yingchun
    ETHICS AND INFORMATION TECHNOLOGY, 2025, 27 (01)
  • [46] Development and evaluation of a large language model of ophthalmology in Chinese
    Zheng, Ce
    Ye, Hongfei
    Guo, Jinming
    Yang, Junrui
    Fei, Ping
    Yuan, Yuanzhi
    Huang, Danqing
    Huang, Yuqiang
    Peng, Jie
    Xie, Xiaoling
    Xie, Meng
    Zhao, Peiquan
    Chen, Li
    Zhang, Mingzhi
    BRITISH JOURNAL OF OPHTHALMOLOGY, 2024,
  • [47] Research and Exploration on Chinese Natural Language Processing in Era of Large Language Models
    大模型时代下的汉语自然语言处理研究与探索
    Xi, Xuefeng (xfxi@mail.usts.edu.cn), 2025, 61 (01) : 80 - 97
  • [48] SciEval: A Multi -Level Large Language Model Evaluation Benchmark for Scientific Research
    Sun, Liangtai
    Han, Yang
    Zhao, Zihan
    Ma, Da
    Shen, Zhennan
    Chen, Baocai
    Chen, Lu
    Yu, Kai
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 19053 - 19061
  • [49] Leveraging Large Language Models for Automated Chinese Essay Scoring
    Feng, Haiyue
    Du, Sixuan
    Zhu, Gaoxia
    Zou, Yan
    Poh Boon Phua
    Feng, Yuhong
    Zhong, Haoming
    Shen, Zhiqi
    Liu, Siyuan
    ARTIFICIAL INTELLIGENCE IN EDUCATION, PT I, AIED 2024, 2024, 14829 : 454 - 467
  • [50] Large language models in traditional Chinese medicine: a systematic review
    Chen, Zhe
    Wang, Hui
    Li, Chengxian
    Liu, Chunxiang
    Yang, Fengwen
    Zhang, Dong
    Fauci, Alice Josephine
    Zhang, Junhua
    ACUPUNCTURE AND HERBAL MEDICINE, 2025, 5 (01) : 57 - 67