Zero-Shot Translation of Attention Patterns in VQA Models to Natural Language

被引：0

作者：

Salewski, Leonard ^{[1
]}

Koepke, A. Sophia ^{[1
]}

Lensch, Hendrik P. A. ^{[1
]}

Akata, Zeynep ^{[1
,2
]}

机构：

[1] Univ Tubingen, Tubingen, Germany

[2] MPI Intelligent Syst, Tubingen, Germany

来源：

PATTERN RECOGNITION, DAGM GCPR 2023 | 2024年 / 14264卷

关键词：

Zero-Shot Translation of Attention Patterns; VQA;

D O I：

10.1007/978-3-031-54605-1_25

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Converting a model's internals to text can yield human-understandable insights about the model. Inspired by the recent success of training-free approaches for image captioning, we propose ZS-A2T, a zero-shot framework that translates the transformer attention of a given model into natural language without requiring any training. We consider this in the context of Visual Question Answering (VQA). ZS-A2T builds on a pre-trained large language model (LLM), which receives a task prompt, question, and predicted answer, as inputs. The LLM is guided to select tokens which describe the regions in the input image that the VQA model attended to. Crucially, we determine this similarity by exploiting the text-image matching capabilities of the underlying VQA model. Our framework does not require any training and allows the drop-in replacement of different guiding sources (e.g. attribution instead of attention maps), or language models. We evaluate this novel task on textual explanation datasets for VQA, giving state-of-the-art performances for the zero-shot setting on GQA-REX and VQA-X. Our code is available here.

引用

页码：378 / 393

页数：16

共 50 条

[1] Modularized Zero-shot VQA with Pre-trained Models
Cao, Rui
Jiang, Jing
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 58 - 76
[2] Exploring Question Decomposition for Zero-Shot VQA
Khan, Zaid
Kumar, Vijay B. G.
Schulter, Samuel
Chandraker, Manmohan
Fu, Yun
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[3] JOINT MUSIC AND LANGUAGE ATTENTION MODELS FOR ZERO-SHOT MUSIC TAGGING
Du, Xingjian
Yu, Zhesong
Lin, Jiaju
Zhu, Bilei
Kong, Qiuqiang
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 1126 - 1130
[4] Zero-shot Natural Language Video Localization
Nam, Jinwoo
Ahn, Daechul
Kang, Dongyeop
Ha, Seong Jong
Choi, Jonghyun
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1450 - 1459
[5] Large Language Models are Zero-Shot Reasoners
Kojima, Takeshi
Gu, Shixiang Shane
Reid, Machel
Matsuo, Yutaka
Iwasawa, Yusuke
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[6] Language Models as Zero-Shot Trajectory Generators
Kwon, Teyun
Di Palo, Norman
Johns, Edward
IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (07): : 6728 - 6735
[7] Improving Zero-shot Translation with Language-Independent Constraints
Pham, Ngoc-Quan
Niehues, Jan
Ha, Thanh-Le
Waibel, Alex
FOURTH CONFERENCE ON MACHINE TRANSLATION (WMT 2019), VOL 1: RESEARCH PAPERS, 2019, : 13 - 23
[8] Language Tags Matter for Zero-Shot Neural Machine Translation
Wu, Liwei
Cheng, Shanbo
Wang, Mingxuan
Li, Lei
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 3001 - 3007
[9] Large Language Models as Zero-Shot Conversational Recommenders
He, Zhankui
Xie, Zhouhang
Jha, Rahul
Steck, Harald
Liang, Dawen
Feng, Yesu
Majumder, Bodhisattwa Prasad
Kallus, Nathan
McAuley, Julian
PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 720 - 730
[10] Zero-Shot Classification of Art With Large Language Models
Tojima, Tatsuya
Yoshida, Mitsuo
IEEE ACCESS, 2025, 13 : 17426 - 17439

← 1 2 3 4 5 →