Zero-Shot Translation of Attention Patterns in VQA Models to Natural Language

被引：0

作者：

Salewski, Leonard ^{[1
]}

Koepke, A. Sophia ^{[1
]}

Lensch, Hendrik P. A. ^{[1
]}

Akata, Zeynep ^{[1
,2
]}

机构：

[1] Univ Tubingen, Tubingen, Germany

[2] MPI Intelligent Syst, Tubingen, Germany

来源：

PATTERN RECOGNITION, DAGM GCPR 2023 | 2024年 / 14264卷

关键词：

Zero-Shot Translation of Attention Patterns; VQA;

D O I：

10.1007/978-3-031-54605-1_25

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Converting a model's internals to text can yield human-understandable insights about the model. Inspired by the recent success of training-free approaches for image captioning, we propose ZS-A2T, a zero-shot framework that translates the transformer attention of a given model into natural language without requiring any training. We consider this in the context of Visual Question Answering (VQA). ZS-A2T builds on a pre-trained large language model (LLM), which receives a task prompt, question, and predicted answer, as inputs. The LLM is guided to select tokens which describe the regions in the input image that the VQA model attended to. Crucially, we determine this similarity by exploiting the text-image matching capabilities of the underlying VQA model. Our framework does not require any training and allows the drop-in replacement of different guiding sources (e.g. attribution instead of attention maps), or language models. We evaluate this novel task on textual explanation datasets for VQA, giving state-of-the-art performances for the zero-shot setting on GQA-REX and VQA-X. Our code is available here.

引用

页码：378 / 393

页数：16

共 50 条

[21] Examining Zero-Shot Vulnerability Repair with Large Language Models
Pearce, Hammond
Tan, Benjamin
Ahmad, Baleegh
Karri, Ramesh
Dolan-Gavitt, Brendan
2023 IEEE SYMPOSIUM ON SECURITY AND PRIVACY, SP, 2023, : 2339 - 2356
[22] Examining Zero-Shot Vulnerability Repair with Large Language Models
Pearce, Hammond
Tan, Benjamin
Ahmad, Baleegh
Karri, Ramesh
Dolan-Gavitt, Brendan
2023 IEEE SYMPOSIUM ON SECURITY AND PRIVACY, SP, 2023, : 2339 - 2356
[23] Revisiting Large Language Models as Zero-shot Relation Extractors
Li, Guozheng
Wang, Peng
Ke, Wenjun
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 6877 - 6892
[24] Generating Training Data with Language Models: Towards Zero-Shot Language Understanding
Meng, Yu
Huang, Jiaxin
Zhang, Yu
Han, Jiawei
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[25] Zero-Shot Recommendation as Language Modeling
Sileo, Damien
Vossen, Wout
Raymaekers, Robbe
ADVANCES IN INFORMATION RETRIEVAL, PT II, 2022, 13186 : 223 - 230
[26] Towards Zero-shot Language Modeling
Ponti, Edoardo M.
Vulic, Ivan
Cotterell, Ryan
Reichart, Roi
Korhonen, Anna
2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 2900 - +
[27] Effective Guidance in Zero-Shot Multilingual Translation via Multiple Language Prototypes
Zheng, Yafang
Lin, Lei
Yuan, Yuxuan
Shi, Xiaodong
NEURAL INFORMATION PROCESSING, ICONIP 2023, PT VI, 2024, 14452 : 226 - 238
[28] MEDAGENTS: Large Language Models as Collaborators for Zero-shot Medical Reasoning
Tang, Xiangru
Zou, Anni
Zhang, Zhuosheng
Li, Ziming
Zhao, Yilun
Zhang, Xingyao
Cohen, Arman
Gerstein, Mark
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 599 - 621
[29] Attention Biasing and Context Augmentation for Zero-Shot Control of Encoder-Decoder Transformers for Natural Language Generation
Hazarika, Devamanyu
Namazifar, Mahdi
Hakkani-Tur, Dilek
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 10738 - 10748
[30] Label Propagation for Zero-shot Classification with Vision-Language Models
Stojnic, Vladan
Kalantidis, Yannis
Tolias, Giorgos
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 23209 - 23218

← 1 2 3 4 5 →