Automated Assessment of Fidelity and Interpretability: An Evaluation Framework for Large Language Models' Explanations (Student Abstract)

被引:0
|
作者
Kuo, Mu-Tien [1 ,2 ]
Hsueh, Chih-Chung [1 ,2 ]
Tsai, Richard Tzong-Han [2 ,3 ]
机构
[1] Chingshin Acad, Taipei, Taiwan
[2] Acad Sinica, Res Ctr Humanities & Social Sci, Taipei, Taiwan
[3] Natl Cent Univ, Dept Comp Sci & Engn, Taoyuan, Taiwan
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As Large Language Models (LLMs) become more prevalent in various fields, it is crucial to rigorously assess the quality of their explanations. Our research introduces a task-agnostic framework for evaluating free-text rationales, drawing on insights from both linguistics and machine learning. We evaluate two dimensions of explainability: fidelity and interpretability. For fidelity, we propose methods suitable for proprietary LLMs where direct introspection of internal features is unattainable. For interpretability, we use language models instead of human evaluators, addressing concerns about subjectivity and scalability in evaluations. We apply our framework to evaluate GPT-3.5 and the impact of prompts on the quality of its explanations. In conclusion, our framework streamlines the evaluation of explanations from LLMs, promoting the development of safer models.
引用
收藏
页码:23554 / 23555
页数:2
相关论文
共 50 条
  • [1] Evaluation of Large Language Models on Code Obfuscation (Student Abstract)
    Swindle, Adrian
    McNealy, Derrick
    Krishnan, Giri
    Ramyaa, Ramyaa
    THIRTY-EIGTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 21, 2024, : 23664 - 23666
  • [2] Automated Natural Language Explanation of Deep Visual Neurons with Large Models (Student Abstract)
    Zhao, Chenxu
    Qian, Wei
    Shi, Yucheng
    Huai, Mengdi
    Liu, Ninghao
    THIRTY-EIGTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 21, 2024, : 23712 - 23713
  • [3] A question-answering framework for automated abstract screening using large language models
    Akinseloyin, Opeoluwa
    Jiang, Xiaorui
    Palade, Vasile
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2024, 31 (09)
  • [4] Large Language Models as Planning Domain Generators (Student Abstract)
    Oswald, James
    Srinivas, Kavitha
    Kokel, Harsha
    Lee, Junkyu
    Katz, Michael
    Sohrabi, Shirin
    THIRTY-EIGTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 21, 2024, : 23604 - 23605
  • [5] Toward cultural interpretability: A linguistic anthropological framework for describing and evaluating large language models
    Jones, Graham M.
    Satran, Shai
    Satyanarayan, Arvind
    BIG DATA & SOCIETY, 2025, 12 (01):
  • [6] Workshop on Large Language Models' Interpretability and Trustworthiness (LLMIT)
    Saha, Tulika
    Ganguly, Debasis
    Saha, Sriparna
    Mitra, Prasenjit
    PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 5290 - 5293
  • [7] Large Language Models as Evaluators for Recommendation Explanations
    Zhang, Xiaoyu
    Li, Yishan
    Wang, Jiayin
    Sun, Bowen
    Ma, Weizhi
    Sun, Peijie
    Zhang, Min
    PROCEEDINGS OF THE EIGHTEENTH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2024, 2024, : 33 - 42
  • [8] Revisiting Automated Topic Model Evaluation with Large Language Models
    Stammbach, Dominik
    Zouhar, Vilem
    Hoyle, Alexander
    Sachan, Mrinmaya
    Ash, Elliott
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 9348 - 9357
  • [9] Quantifying Uncertainty in Natural Language Explanations of Large Language Models
    Tanneru, Sree Harsha
    Agarwal, Chirag
    Lakkaraju, Himabindu
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [10] Using Large Language Models for Automated Grading of Student Writing about Science
    Impey, Chris
    Wenger, Matthew
    Garuda, Nikhil
    Golchin, Shahriar
    Stamer, Sarah
    INTERNATIONAL JOURNAL OF ARTIFICIAL INTELLIGENCE IN EDUCATION, 2025,