Automated Assessment of Fidelity and Interpretability: An Evaluation Framework for Large Language Models' Explanations (Student Abstract)

被引:0
|
作者
Kuo, Mu-Tien [1 ,2 ]
Hsueh, Chih-Chung [1 ,2 ]
Tsai, Richard Tzong-Han [2 ,3 ]
机构
[1] Chingshin Acad, Taipei, Taiwan
[2] Acad Sinica, Res Ctr Humanities & Social Sci, Taipei, Taiwan
[3] Natl Cent Univ, Dept Comp Sci & Engn, Taoyuan, Taiwan
来源
THIRTY-EIGTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 21 | 2024年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As Large Language Models (LLMs) become more prevalent in various fields, it is crucial to rigorously assess the quality of their explanations. Our research introduces a task-agnostic framework for evaluating free-text rationales, drawing on insights from both linguistics and machine learning. We evaluate two dimensions of explainability: fidelity and interpretability. For fidelity, we propose methods suitable for proprietary LLMs where direct introspection of internal features is unattainable. For interpretability, we use language models instead of human evaluators, addressing concerns about subjectivity and scalability in evaluations. We apply our framework to evaluate GPT-3.5 and the impact of prompts on the quality of its explanations. In conclusion, our framework streamlines the evaluation of explanations from LLMs, promoting the development of safer models.
引用
收藏
页码:23554 / 23555
页数:2
相关论文
共 50 条
  • [41] A framework for human evaluation of large language models in healthcare derived from literature review
    Tam, Thomas Yu Chow
    Sivarajkumar, Sonish
    Kapoor, Sumit
    Stolyar, Alisa V.
    Polanska, Katelyn
    McCarthy, Karleigh R.
    Osterhoudt, Hunter
    Wu, Xizhi
    Visweswaran, Shyam
    Fu, Sunyang
    Mathur, Piyush
    Cacciamani, Giovanni E.
    Sun, Cong
    Peng, Yifan
    Wang, Yanshan
    NPJ DIGITAL MEDICINE, 2024, 7 (01):
  • [42] Leveraging the Inductive Bias of Large Language Models for Abstract Textual Reasoning
    Rytting, Christopher Michael
    Wingate, David
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [43] Evaluating the effectiveness of large language models in abstract screening: a comparative analysis
    Li, Michael
    Sun, Jianping
    Tan, Xianming
    SYSTEMATIC REVIEWS, 2024, 13 (01)
  • [44] Automatic Generation of Programming Exercises and Code Explanations Using Large Language Models
    Sarsa, Sami
    Denny, Paul
    Hellas, Arto
    Leinonen, Juho
    PROCEEDINGS OF THE 2022 ACM CONFERENCE ON INTERNATIONAL COMPUTING EDUCATION RESEARCH, ICER 2022, VOL. 1, 2023, : 27 - 43
  • [45] Comparing Different Approaches to Generating Mathematics Explanations Using Large Language Models
    Prihar, Ethan
    Lee, Morgan
    Hopman, Mia
    Kalai, Adam Tauman
    Vempala, Sofia
    Wang, Allison
    Wickline, Gabriel
    Murray, Aly
    Heffernan, Neil
    ARTIFICIAL INTELLIGENCE IN EDUCATION. POSTERS AND LATE BREAKING RESULTS, WORKSHOPS AND TUTORIALS, INDUSTRY AND INNOVATION TRACKS, PRACTITIONERS, DOCTORAL CONSORTIUM AND BLUE SKY, AIED 2023, 2023, 1831 : 290 - 295
  • [46] A Method for Generating Explanations of Offensive Memes Based on Multimodal Large Language Models
    Lin M.
    Dai C.
    Guo T.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2024, 61 (05): : 1206 - 1217
  • [47] EVALUATION OF STUDENT COMPETENCES: ASSESSMENT TECHNIQUES AND MODELS
    Medina Rivilla, Antonio
    Dominguez Garrido, Ma Concepcion
    Sanchez Romero, Cristina
    RIE-REVISTA DE INVESTIGACION EDUCATIVA, 2013, 31 (01): : 239 - 255
  • [48] Statistical Knowledge Assessment for Large Language Models
    Dong, Qingxiu
    Xu, Jingjing
    Kong, Lingpeng
    Sui, Zhifang
    Li, Lei
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [49] EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models
    Zhou, Weikang
    Wang, Xiao
    Xiong, Limao
    Xia, Han
    Gu, Yingshuang
    Chai, Mingxu
    Zhu, Fukang
    Huang, Caishuang
    Dou, Shihan
    Xi, Zhiheng
    Zheng, Rui
    Gao, Songyang
    Zou, Yicheng
    Yan, Hang
    Le, Yifan
    Wang, Ruohui
    Li, Lijun
    Shao, Jing
    Gui, Tao
    Zhang, Qi
    Huang, Xuanjing
    arXiv,
  • [50] A Superalignment Framework in Autonomous Driving with Large Language Models
    Kong, Xiangrui
    Braunl, Thomas
    Fahmi, Marco
    Wang, Yue
    2024 35TH IEEE INTELLIGENT VEHICLES SYMPOSIUM, IEEE IV 2024, 2024, : 1715 - 1720