C2F-Explainer: Explaining Transformers Better Through a Coarse-to-Fine Strategy

被引：1

作者：

Ding, Weiping ^{[1
]}

Cheng, Xiaotian ^{[1
]}

Geng, Yu ^{[1
]}

Huang, Jiashuang ^{[1
]}

Ju, Hengrong ^{[1
]}

机构：

[1] Nantong Univ, Sch Artificial Intelligence & Comp Sci, Nantong 226019, Peoples R China

来源：

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING | 2024年 / 36卷 / 12期

基金：

中国博士后科学基金; 中国国家自然科学基金;

关键词：

Transformers; Head; Feature extraction; Computer vision; Visualization; Semantics; Computational modeling; Interpretable method; perturbation mask; self-attention mechanism; sequential three-way decision;

D O I：

10.1109/TKDE.2024.3443888

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Transformer interpretability research is a hot topic in the area of deep learning. Traditional interpretation methods mostly use the final layer output of the Transformer encoder as masks to generate an explanation map. However, These approaches overlook two crucial aspects. At the coarse-grained level, the mask may contain uncertain information, including unreliable and incomplete object location data; at the fine-grained level, there is information loss on the mask, resulting in spatial noise and detail loss. To address these issues, in this paper, we propose a two-stage coarse-to-fine strategy (C2F-Explainer) for improving Transformer interpretability. Specifically, we first design a sequential three-way mask (S3WM) module to handle the problem of uncertain information at the coarse-grained level. This module uses sequential three-way decisions to process the mask, preventing uncertain information on the mask from impacting the interpretation results, thus obtaining coarse-grained interpretation results with accurate position. Second, to further reduce the impact of information loss at the fine-grained level, we devised an attention fusion (AF) module inspired by the fact that self-attention can capture global semantic information, AF aggregates the attention matrix to generate a cross-layer relation matrix, which is then used to optimize detailed information on the interpretation results and produce fine-grained interpretation results with clear and complete edges. Experimental results show that the proposed C2F-Explainer has good interpretation results on both natural and medical image datasets, and the mIoU is improved by 2.08% on the PASCAL VOC 2012 dataset.

引用

页码：7708 / 7724

页数：17

共 23 条

[21] FgC2F-UDiff: Frequency-Guided and Coarse-to-Fine Unified Diffusion Model for Multi-Modality Missing MRI Synthesis
Xiao, Xiaojiao
Hu, Qinmin Vivian
Wang, Guanghui
IEEE TRANSACTIONS ON COMPUTATIONAL IMAGING, 2024, 10 : 1815 - 1828
[22] Enhancement of mass transfer process in column leaching by graded combination of coarse-medium-fine (C-M-F) particle packing: Application to heap leaching of ion-adsorption type rare earth ore with mixed (NH4)2SO4 and NH4Cl solution
Liu, Defeng
Wang, Yuzhi
Zhang, Zhenyue
Chi, Ruan
HYDROMETALLURGY, 2025, 232
[23] New Acceptor-Bridge-Donor Strategy for Enhancing NLO Response with Long-Range Excess Electron Transfer from the NH2•••M/M3O Donor (M = Li, Na, K) to Inside the Electron Hole Cage C20F19 Acceptor through the Unusual σ Chain Bridge (CH2)4
Bai, Yang
Zhou, Zhong-Jun
Wang, Jia-Jun
Li, Ying
Wu, Di
Chen, Wei
Li, Zhi-Ru
Sun, Chia-Chung
JOURNAL OF PHYSICAL CHEMISTRY A, 2013, 117 (13): : 2835 - 2843

← 1 2 3 →