Interpretability and explainability of AI are becoming increasingly important in light of the rapid development of large language models (LLMs). This paper investigates the interpretation of LLMs in the context of the knowledge-based question answering. The main hypothesis of the study is that correct and incorrect model behavior can be distinguished at the level of hidden states. The quantized models LLaMA-2-7B-Chat, Mistral-7B, Vicuna-7B and the MuSeRC question-answering dataset are used to test this hypothesis. The results of the analysis support the proposed hypothesis. We also identify the layers which have a negative effect on the model's behavior. As a prospect of practical application of the hypothesis, we propose to train such "weak" layers additionally in order to improve the quality of the task solution.
机构:
Longhua Univ Hosp, Dept Med Oncol, Shanghai, Peoples R ChinaLonghua Univ Hosp, Dept Med Oncol, Shanghai, Peoples R China
Guo, Huiru
Li, Hegen
论文数: 0引用数: 0
h-index: 0
机构:
Longhua Univ Hosp, Dept Med Oncol, Shanghai, Peoples R ChinaLonghua Univ Hosp, Dept Med Oncol, Shanghai, Peoples R China
Li, Hegen
Zhu, Lihua
论文数: 0引用数: 0
h-index: 0
机构:
Longhua Univ Hosp, Dept Med Oncol, Shanghai, Peoples R ChinaLonghua Univ Hosp, Dept Med Oncol, Shanghai, Peoples R China
Zhu, Lihua
Feng, Jiali
论文数: 0引用数: 0
h-index: 0
机构:
Longhua Univ Hosp, Dept Med Oncol, Shanghai, Peoples R ChinaLonghua Univ Hosp, Dept Med Oncol, Shanghai, Peoples R China
Feng, Jiali
Huang, Xiange
论文数: 0引用数: 0
h-index: 0
机构:
Longhua Univ Hosp, Dept Med Oncol, Shanghai, Peoples R ChinaLonghua Univ Hosp, Dept Med Oncol, Shanghai, Peoples R China
Huang, Xiange
Baak, Jan P. A.
论文数: 0引用数: 0
h-index: 0
机构:
Stavanger Univ Hosp, Dept Pathol, Stavanger, Norway
Dr Med Jan Baak AS, Tananger, NorwayLonghua Univ Hosp, Dept Med Oncol, Shanghai, Peoples R China