Automated Assessment of Fidelity and Interpretability: An Evaluation Framework for Large Language Models' Explanations (Student Abstract)

被引：0

作者：

Kuo, Mu-Tien ^{[1
,2
]}

Hsueh, Chih-Chung ^{[1
,2
]}

Tsai, Richard Tzong-Han ^{[2
,3
]}

机构：

[1] Chingshin Acad, Taipei, Taiwan

[2] Acad Sinica, Res Ctr Humanities & Social Sci, Taipei, Taiwan

[3] Natl Cent Univ, Dept Comp Sci & Engn, Taoyuan, Taiwan

来源：

THIRTY-EIGTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 21 | 2024年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

As Large Language Models (LLMs) become more prevalent in various fields, it is crucial to rigorously assess the quality of their explanations. Our research introduces a task-agnostic framework for evaluating free-text rationales, drawing on insights from both linguistics and machine learning. We evaluate two dimensions of explainability: fidelity and interpretability. For fidelity, we propose methods suitable for proprietary LLMs where direct introspection of internal features is unattainable. For interpretability, we use language models instead of human evaluators, addressing concerns about subjectivity and scalability in evaluations. We apply our framework to evaluate GPT-3.5 and the impact of prompts on the quality of its explanations. In conclusion, our framework streamlines the evaluation of explanations from LLMs, promoting the development of safer models.

引用

页码：23554 / 23555

页数：2

共 50 条

[41] A framework for human evaluation of large language models in healthcare derived from literature review
Tam, Thomas Yu Chow
Sivarajkumar, Sonish
Kapoor, Sumit
Stolyar, Alisa V.
Polanska, Katelyn
McCarthy, Karleigh R.
Osterhoudt, Hunter
Wu, Xizhi
Visweswaran, Shyam
Fu, Sunyang
Mathur, Piyush
Cacciamani, Giovanni E.
Sun, Cong
Peng, Yifan
Wang, Yanshan
NPJ DIGITAL MEDICINE, 2024, 7 (01):
[42] Leveraging the Inductive Bias of Large Language Models for Abstract Textual Reasoning
Rytting, Christopher Michael
Wingate, David
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[43] Evaluating the effectiveness of large language models in abstract screening: a comparative analysis
Li, Michael
Sun, Jianping
Tan, Xianming
SYSTEMATIC REVIEWS, 2024, 13 (01)
[44] Automatic Generation of Programming Exercises and Code Explanations Using Large Language Models
Sarsa, Sami
Denny, Paul
Hellas, Arto
Leinonen, Juho
PROCEEDINGS OF THE 2022 ACM CONFERENCE ON INTERNATIONAL COMPUTING EDUCATION RESEARCH, ICER 2022, VOL. 1, 2023, : 27 - 43
[45] Comparing Different Approaches to Generating Mathematics Explanations Using Large Language Models
Prihar, Ethan
Lee, Morgan
Hopman, Mia
Kalai, Adam Tauman
Vempala, Sofia
Wang, Allison
Wickline, Gabriel
Murray, Aly
Heffernan, Neil
ARTIFICIAL INTELLIGENCE IN EDUCATION. POSTERS AND LATE BREAKING RESULTS, WORKSHOPS AND TUTORIALS, INDUSTRY AND INNOVATION TRACKS, PRACTITIONERS, DOCTORAL CONSORTIUM AND BLUE SKY, AIED 2023, 2023, 1831 : 290 - 295
[46] A Method for Generating Explanations of Offensive Memes Based on Multimodal Large Language Models
Lin M.
Dai C.
Guo T.
Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2024, 61 (05): : 1206 - 1217
[47] EVALUATION OF STUDENT COMPETENCES: ASSESSMENT TECHNIQUES AND MODELS
Medina Rivilla, Antonio
Dominguez Garrido, Ma Concepcion
Sanchez Romero, Cristina
RIE-REVISTA DE INVESTIGACION EDUCATIVA, 2013, 31 (01): : 239 - 255
[48] Statistical Knowledge Assessment for Large Language Models
Dong, Qingxiu
Xu, Jingjing
Kong, Lingpeng
Sui, Zhifang
Li, Lei
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[49] EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models
Zhou, Weikang
Wang, Xiao
Xiong, Limao
Xia, Han
Gu, Yingshuang
Chai, Mingxu
Zhu, Fukang
Huang, Caishuang
Dou, Shihan
Xi, Zhiheng
Zheng, Rui
Gao, Songyang
Zou, Yicheng
Yan, Hang
Le, Yifan
Wang, Ruohui
Li, Lijun
Shao, Jing
Gui, Tao
Zhang, Qi
Huang, Xuanjing
arXiv,
[50] A Superalignment Framework in Autonomous Driving with Large Language Models
Kong, Xiangrui
Braunl, Thomas
Fahmi, Marco
Wang, Yue
2024 35TH IEEE INTELLIGENT VEHICLES SYMPOSIUM, IEEE IV 2024, 2024, : 1715 - 1720

← 1 2 3 4 5 →