Zero-Shot Translation of Attention Patterns in VQA Models to Natural Language

被引:0
|
作者
Salewski, Leonard [1 ]
Koepke, A. Sophia [1 ]
Lensch, Hendrik P. A. [1 ]
Akata, Zeynep [1 ,2 ]
机构
[1] Univ Tubingen, Tubingen, Germany
[2] MPI Intelligent Syst, Tubingen, Germany
来源
关键词
Zero-Shot Translation of Attention Patterns; VQA;
D O I
10.1007/978-3-031-54605-1_25
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Converting a model's internals to text can yield human-understandable insights about the model. Inspired by the recent success of training-free approaches for image captioning, we propose ZS-A2T, a zero-shot framework that translates the transformer attention of a given model into natural language without requiring any training. We consider this in the context of Visual Question Answering (VQA). ZS-A2T builds on a pre-trained large language model (LLM), which receives a task prompt, question, and predicted answer, as inputs. The LLM is guided to select tokens which describe the regions in the input image that the VQA model attended to. Crucially, we determine this similarity by exploiting the text-image matching capabilities of the underlying VQA model. Our framework does not require any training and allows the drop-in replacement of different guiding sources (e.g. attribution instead of attention maps), or language models. We evaluate this novel task on textual explanation datasets for VQA, giving state-of-the-art performances for the zero-shot setting on GQA-REX and VQA-X. Our code is available here.
引用
收藏
页码:378 / 393
页数:16
相关论文
共 50 条
  • [31] ENABLING ZERO-SHOT MULTILINGUAL SPOKEN LANGUAGE TRANSLATION WITH LANGUAGE-SPECIFIC ENCODERS AND DECODERS
    Escolano, Carlos
    Costa-jussa, Marta R.
    Fonollosa, Jose A. R.
    Segura, Carlos
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 694 - 701
  • [32] Zero-shot Image-to-Image Translation
    Parmar, Gaurav
    Singh, Krishna Kumar
    Zhang, Richard
    Li, Yijun
    Lu, Jingwan
    Zhu, Jun-Yan
    PROCEEDINGS OF SIGGRAPH 2023 CONFERENCE PAPERS, SIGGRAPH 2023, 2023,
  • [33] Rotation, Translation, and Cropping for Zero-Shot Generalization
    Ye, Chang
    Khalifa, Ahmed
    Bontrager, Philip
    Togelius, Julian
    2020 IEEE CONFERENCE ON GAMES (IEEE COG 2020), 2020, : 57 - 64
  • [34] Understanding and Mitigating the Uncertainty in Zero-Shot Translation
    Wang, Wenxuan
    Jiao, Wenxiang
    Wang, Shuo
    Tu, Zhaopeng
    Lyu, Michael R.
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 4894 - 4904
  • [35] Zero-shot Bilingual App Reviews Mining with Large Language Models
    Wei, Jialiang
    Courbis, Anne-Lise
    Lambolais, Thomas
    Xu, Binbin
    Bernard, Pierre Louis
    Dray, Gerard
    2023 IEEE 35TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2023, : 898 - 904
  • [36] Zero-Shot AutoML with Pretrained Models
    Oeztuerk, Ekrem
    Ferreira, Fabio
    Jomaa, Hadi S.
    Schmidt-Thieme, Lars
    Grabocka, Josif
    Hutter, Frank
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [37] Evaluating the Supervised and Zero-shot Performance of Multi-lingual Translation Models
    Hokamp, Chris
    Glover, John
    Gholipour, Demian
    FOURTH CONFERENCE ON MACHINE TRANSLATION (WMT 2019), 2019, : 209 - 217
  • [38] Large Language Models as Zero-Shot Human Models for Human-Robot Interaction
    Zhang, Bowen
    Soh, Harold
    2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2023, : 7961 - 7968
  • [39] Transferring Zero-shot Multilingual Chinese-Chinese Translation Model for Chinese Minority Language Translation
    Yan, Ziyue
    Zan, Hongying
    Guo, Yifan
    Xu, Hongfei
    2024 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, IALP 2024, 2024, : 133 - 138
  • [40] Combining Small Language Models and Large Language Models for Zero-Shot NL2SQL
    Fan, Ju
    Gu, Zihui
    Zhang, Songyue
    Zhang, Yuxin
    Chen, Zui
    Cao, Lei
    Li, Guoliang
    Madden, Samuel
    Du, Xiaoyong
    Tang, Nan
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2024, 17 (11): : 2750 - 2763