Automated Assessment of Fidelity and Interpretability: An Evaluation Framework for Large Language Models' Explanations (Student Abstract)

被引：0

作者：

Kuo, Mu-Tien ^{[1
,2
]}

Hsueh, Chih-Chung ^{[1
,2
]}

Tsai, Richard Tzong-Han ^{[2
,3
]}

机构：

[1] Chingshin Acad, Taipei, Taiwan

[2] Acad Sinica, Res Ctr Humanities & Social Sci, Taipei, Taiwan

[3] Natl Cent Univ, Dept Comp Sci & Engn, Taoyuan, Taiwan

来源：

THIRTY-EIGTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 21 | 2024年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

As Large Language Models (LLMs) become more prevalent in various fields, it is crucial to rigorously assess the quality of their explanations. Our research introduces a task-agnostic framework for evaluating free-text rationales, drawing on insights from both linguistics and machine learning. We evaluate two dimensions of explainability: fidelity and interpretability. For fidelity, we propose methods suitable for proprietary LLMs where direct introspection of internal features is unattainable. For interpretability, we use language models instead of human evaluators, addressing concerns about subjectivity and scalability in evaluations. We apply our framework to evaluate GPT-3.5 and the impact of prompts on the quality of its explanations. In conclusion, our framework streamlines the evaluation of explanations from LLMs, promoting the development of safer models.

引用

页码：23554 / 23555

页数：2

共 50 条

[1] Evaluation of Large Language Models on Code Obfuscation (Student Abstract)
Swindle, Adrian
McNealy, Derrick
Krishnan, Giri
Ramyaa, Ramyaa
THIRTY-EIGTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 21, 2024, : 23664 - 23666
[2] Automated Natural Language Explanation of Deep Visual Neurons with Large Models (Student Abstract)
Zhao, Chenxu
Qian, Wei
Shi, Yucheng
Huai, Mengdi
Liu, Ninghao
THIRTY-EIGTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 21, 2024, : 23712 - 23713
[3] A question-answering framework for automated abstract screening using large language models
Akinseloyin, Opeoluwa
Jiang, Xiaorui
Palade, Vasile
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2024, 31 (09)
[4] Large Language Models as Planning Domain Generators (Student Abstract)
Oswald, James
Srinivas, Kavitha
Kokel, Harsha
Lee, Junkyu
Katz, Michael
Sohrabi, Shirin
THIRTY-EIGTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 21, 2024, : 23604 - 23605
[5] Toward cultural interpretability: A linguistic anthropological framework for describing and evaluating large language models
Jones, Graham M.
Satran, Shai
Satyanarayan, Arvind
BIG DATA & SOCIETY, 2025, 12 (01):
[6] Workshop on Large Language Models' Interpretability and Trustworthiness (LLMIT)
Saha, Tulika
Ganguly, Debasis
Saha, Sriparna
Mitra, Prasenjit
PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 5290 - 5293
[7] Large Language Models as Evaluators for Recommendation Explanations
Zhang, Xiaoyu
Li, Yishan
Wang, Jiayin
Sun, Bowen
Ma, Weizhi
Sun, Peijie
Zhang, Min
PROCEEDINGS OF THE EIGHTEENTH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2024, 2024, : 33 - 42
[8] Revisiting Automated Topic Model Evaluation with Large Language Models
Stammbach, Dominik
Zouhar, Vilem
Hoyle, Alexander
Sachan, Mrinmaya
Ash, Elliott
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 9348 - 9357
[9] Quantifying Uncertainty in Natural Language Explanations of Large Language Models
Tanneru, Sree Harsha
Agarwal, Chirag
Lakkaraju, Himabindu
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
[10] Using Large Language Models for Automated Grading of Student Writing about Science
Impey, Chris
Wenger, Matthew
Garuda, Nikhil
Golchin, Shahriar
Stamer, Sarah
INTERNATIONAL JOURNAL OF ARTIFICIAL INTELLIGENCE IN EDUCATION, 2025,

← 1 2 3 4 5 →