Automated Assessment of Fidelity and Interpretability: An Evaluation Framework for Large Language Models' Explanations (Student Abstract)

被引：0

作者：

Kuo, Mu-Tien ^{[1
,2
]}

Hsueh, Chih-Chung ^{[1
,2
]}

Tsai, Richard Tzong-Han ^{[2
,3
]}

机构：

[1] Chingshin Acad, Taipei, Taiwan

[2] Acad Sinica, Res Ctr Humanities & Social Sci, Taipei, Taiwan

[3] Natl Cent Univ, Dept Comp Sci & Engn, Taoyuan, Taiwan

来源：

THIRTY-EIGTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 21 | 2024年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

As Large Language Models (LLMs) become more prevalent in various fields, it is crucial to rigorously assess the quality of their explanations. Our research introduces a task-agnostic framework for evaluating free-text rationales, drawing on insights from both linguistics and machine learning. We evaluate two dimensions of explainability: fidelity and interpretability. For fidelity, we propose methods suitable for proprietary LLMs where direct introspection of internal features is unattainable. For interpretability, we use language models instead of human evaluators, addressing concerns about subjectivity and scalability in evaluations. We apply our framework to evaluate GPT-3.5 and the impact of prompts on the quality of its explanations. In conclusion, our framework streamlines the evaluation of explanations from LLMs, promoting the development of safer models.

引用

页码：23554 / 23555

页数：2

共 50 条

[31] Large Language Models-Based Local Explanations of Text Classifiers
Angiulli, Fabrizio
De Luca, Francesco
Fassetti, Fabio
Nistico, Simona
DISCOVERY SCIENCE, DS 2024, PT I, 2025, 15243 : 19 - 35
[32] A Survey on Evaluation of Large Language Models
Chang, Yupeng
Wang, Xu
Wang, Jindong
Wu, Yuan
Yang, Linyi
Zhu, Kaijie
Chen, Hao
Yi, Xiaoyuan
Wang, Cunxiang
Wang, Yidong
Ye, Wei
Zhang, Yue
Chang, Yi
Yu, Philip S.
Yang, Qiang
Xie, Xing
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2024, 15 (03)
[33] Automated Repair of Programs from Large Language Models
National University of Singapore, Singapore
不详
不详
arXiv, 1600,
[34] Large language models direct automated chemistry laboratory
Ana Laura Dias
Tiago Rodrigues
Nature, 2023, 624 : 530 - 531
[35] Leveraging Large Language Models for Automated Dialogue Analysis
Finch, Sarah E.
Paek, Ellie S.
Choi, Jinho D.
24TH MEETING OF THE SPECIAL INTEREST GROUP ON DISCOURSE AND DIALOGUE, SIGDIAL 2023, 2023, : 202 - 215
[36] Large language models direct automated chemistry laboratory
Dias, Ana Laura
Rodrigues, Tiago
NATURE, 2023, 624 (7992) : 530 - 531
[37] Automated Disentangled Sequential Recommendation with Large Language Models
Wang, Xin
Chen, Hong
Pan, Zirui
Zhou, Yuwei
Guan, Chaoyu
Sun, Lifeng
Zhu, Wenwu
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2025, 43 (02)
[38] Automated Repair of Programs from Large Language Models
Fan, Zhiyu
Gao, Xiang
Mirchev, Martin
Roychoudhury, Abhik
Tan, Shin Hwei
2023 IEEE/ACM 45TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ICSE, 2023, : 1469 - 1481
[39] Biases Mitigation and Expressiveness Preservation in Language Models: A Comprehensive Pipeline (Student Abstract)
Yu, Liu
Guo, Ludie
Kuang, Ping
Zhou, Fan
THIRTY-EIGTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 21, 2024, : 23701 - 23702
[40] Large Language Models, scientific knowledge and factuality: A framework to streamline human expert evaluation
Wysocka, Magdalena
Wysocki, Oskar
Delmas, Maxime
Mutel, Vincent
Freitas, Andre
JOURNAL OF BIOMEDICAL INFORMATICS, 2024, 158

← 1 2 3 4 5 →