Automated Assessment of Fidelity and Interpretability: An Evaluation Framework for Large Language Models' Explanations (Student Abstract)

被引:0
|
作者
Kuo, Mu-Tien [1 ,2 ]
Hsueh, Chih-Chung [1 ,2 ]
Tsai, Richard Tzong-Han [2 ,3 ]
机构
[1] Chingshin Acad, Taipei, Taiwan
[2] Acad Sinica, Res Ctr Humanities & Social Sci, Taipei, Taiwan
[3] Natl Cent Univ, Dept Comp Sci & Engn, Taoyuan, Taiwan
来源
THIRTY-EIGTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 21 | 2024年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As Large Language Models (LLMs) become more prevalent in various fields, it is crucial to rigorously assess the quality of their explanations. Our research introduces a task-agnostic framework for evaluating free-text rationales, drawing on insights from both linguistics and machine learning. We evaluate two dimensions of explainability: fidelity and interpretability. For fidelity, we propose methods suitable for proprietary LLMs where direct introspection of internal features is unattainable. For interpretability, we use language models instead of human evaluators, addressing concerns about subjectivity and scalability in evaluations. We apply our framework to evaluate GPT-3.5 and the impact of prompts on the quality of its explanations. In conclusion, our framework streamlines the evaluation of explanations from LLMs, promoting the development of safer models.
引用
收藏
页码:23554 / 23555
页数:2
相关论文
共 50 条
  • [31] Large Language Models-Based Local Explanations of Text Classifiers
    Angiulli, Fabrizio
    De Luca, Francesco
    Fassetti, Fabio
    Nistico, Simona
    DISCOVERY SCIENCE, DS 2024, PT I, 2025, 15243 : 19 - 35
  • [32] A Survey on Evaluation of Large Language Models
    Chang, Yupeng
    Wang, Xu
    Wang, Jindong
    Wu, Yuan
    Yang, Linyi
    Zhu, Kaijie
    Chen, Hao
    Yi, Xiaoyuan
    Wang, Cunxiang
    Wang, Yidong
    Ye, Wei
    Zhang, Yue
    Chang, Yi
    Yu, Philip S.
    Yang, Qiang
    Xie, Xing
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2024, 15 (03)
  • [33] Automated Repair of Programs from Large Language Models
    National University of Singapore, Singapore
    不详
    不详
    arXiv, 1600,
  • [34] Large language models direct automated chemistry laboratory
    Ana Laura Dias
    Tiago Rodrigues
    Nature, 2023, 624 : 530 - 531
  • [35] Leveraging Large Language Models for Automated Dialogue Analysis
    Finch, Sarah E.
    Paek, Ellie S.
    Choi, Jinho D.
    24TH MEETING OF THE SPECIAL INTEREST GROUP ON DISCOURSE AND DIALOGUE, SIGDIAL 2023, 2023, : 202 - 215
  • [36] Large language models direct automated chemistry laboratory
    Dias, Ana Laura
    Rodrigues, Tiago
    NATURE, 2023, 624 (7992) : 530 - 531
  • [37] Automated Disentangled Sequential Recommendation with Large Language Models
    Wang, Xin
    Chen, Hong
    Pan, Zirui
    Zhou, Yuwei
    Guan, Chaoyu
    Sun, Lifeng
    Zhu, Wenwu
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2025, 43 (02)
  • [38] Automated Repair of Programs from Large Language Models
    Fan, Zhiyu
    Gao, Xiang
    Mirchev, Martin
    Roychoudhury, Abhik
    Tan, Shin Hwei
    2023 IEEE/ACM 45TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ICSE, 2023, : 1469 - 1481
  • [39] Biases Mitigation and Expressiveness Preservation in Language Models: A Comprehensive Pipeline (Student Abstract)
    Yu, Liu
    Guo, Ludie
    Kuang, Ping
    Zhou, Fan
    THIRTY-EIGTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 21, 2024, : 23701 - 23702
  • [40] Large Language Models, scientific knowledge and factuality: A framework to streamline human expert evaluation
    Wysocka, Magdalena
    Wysocki, Oskar
    Delmas, Maxime
    Mutel, Vincent
    Freitas, Andre
    JOURNAL OF BIOMEDICAL INFORMATICS, 2024, 158