Zero-Shot Translation of Attention Patterns in VQA Models to Natural Language

被引:0
|
作者
Salewski, Leonard [1 ]
Koepke, A. Sophia [1 ]
Lensch, Hendrik P. A. [1 ]
Akata, Zeynep [1 ,2 ]
机构
[1] Univ Tubingen, Tubingen, Germany
[2] MPI Intelligent Syst, Tubingen, Germany
来源
关键词
Zero-Shot Translation of Attention Patterns; VQA;
D O I
10.1007/978-3-031-54605-1_25
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Converting a model's internals to text can yield human-understandable insights about the model. Inspired by the recent success of training-free approaches for image captioning, we propose ZS-A2T, a zero-shot framework that translates the transformer attention of a given model into natural language without requiring any training. We consider this in the context of Visual Question Answering (VQA). ZS-A2T builds on a pre-trained large language model (LLM), which receives a task prompt, question, and predicted answer, as inputs. The LLM is guided to select tokens which describe the regions in the input image that the VQA model attended to. Crucially, we determine this similarity by exploiting the text-image matching capabilities of the underlying VQA model. Our framework does not require any training and allows the drop-in replacement of different guiding sources (e.g. attribution instead of attention maps), or language models. We evaluate this novel task on textual explanation datasets for VQA, giving state-of-the-art performances for the zero-shot setting on GQA-REX and VQA-X. Our code is available here.
引用
收藏
页码:378 / 393
页数:16
相关论文
共 50 条
  • [41] Towards Zero-Shot Sign Language Recognition
    Bilge, Yunus Can
    Cinbis, Ramazan Gokberk
    Ikizler-Cinbis, Nazli
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (01) : 1217 - 1232
  • [42] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
    Yang, Antoine
    Miech, Antoine
    Sivic, Josef
    Laptev, Ivan
    Schmid, Cordelia
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [43] Zero-shot interpretable phenotyping of postpartum hemorrhage using large language models
    Alsentzer, Emily
    Rasmussen, Matthew J.
    Fontoura, Romy
    Cull, Alexis L.
    Beaulieu-Jones, Brett
    Gray, Kathryn J.
    Bates, David W.
    Kovacheva, Vesela P.
    NPJ DIGITAL MEDICINE, 2023, 6 (01)
  • [44] Language models enable zero-shot prediction of the effects of mutations on protein function
    Meier, Joshua
    Rao, Roshan
    Verkuil, Robert
    Liu, Jason
    Sercu, Tom
    Rives, Alexander
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [45] PqE: Zero-Shot Document Expansion for Dense Retrieval with Large Language Models
    Liu, Jiyuan
    Zou, Dongsheng
    Chai, Naiquan
    Yang, Yuming
    Wang, Hao
    Song, Xinyi
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT I, NLPCC 2024, 2025, 15359 : 97 - 109
  • [46] Zero-Shot Generative Large Language Models for Systematic Review Screening Automation
    Wang, Shuai
    Scells, Harrisen
    Zhuang, Shengyao
    Potthast, Martin
    Koopman, Bevan
    Zuccon, Guido
    ADVANCES IN INFORMATION RETRIEVAL, ECIR 2024, PT I, 2024, 14608 : 403 - 420
  • [47] Vision-Language Models for Zero-Shot Classification of Remote Sensing Images
    Al Rahhal, Mohamad Mahmoud
    Bazi, Yakoub
    Elgibreen, Hebah
    Zuair, Mansour
    APPLIED SCIENCES-BASEL, 2023, 13 (22):
  • [48] Improving Zero-Shot Text Matching for Financial Auditing with Large Language Models
    Hillebrand, Lars
    Berger, Armin
    Deusser, Tobias
    Dilmaghani, Tim
    Khaled, Mohamed
    Kliem, Bernd
    Loitz, Ruediger
    Pielka, Maren
    Leonhard, David
    Bauckhage, Christian
    Sifa, Rafet
    PROCEEDINGS OF THE 2023 ACM SYMPOSIUM ON DOCUMENT ENGINEERING, DOCENG 2023, 2023,
  • [49] Zero-shot domain paraphrase with unaligned pre-trained language models
    Chen, Zheng
    Yuan, Hu
    Ren, Jiankun
    COMPLEX & INTELLIGENT SYSTEMS, 2023, 9 (01) : 1097 - 1110
  • [50] Zero-shot interpretable phenotyping of postpartum hemorrhage using large language models
    Emily Alsentzer
    Matthew J. Rasmussen
    Romy Fontoura
    Alexis L. Cull
    Brett Beaulieu-Jones
    Kathryn J. Gray
    David W. Bates
    Vesela P. Kovacheva
    npj Digital Medicine, 6