SA-VQA: Structured Alignment of Visual and Semantic Representations for Visual Question Answering

被引:0
|
作者
Xiong, Peixi [1 ]
You, Quanzeng [2 ]
Yu, Pei [2 ]
Liu, Zicheng [2 ]
Wu, Ying [1 ]
机构
[1] Northwestern University, United States
[2] Microsoft Research
来源
arXiv | 2022年
关键词
Compilation and indexing terms; Copyright 2024 Elsevier Inc;
D O I
暂无
中图分类号
学科分类号
摘要
Deep learning
引用
收藏
相关论文
共 50 条
  • [21] Surgical-VQA: Visual Question Answering in Surgical Scenes Using Transformer
    Seenivasan, Lalithkumar
    Islam, Mobarakol
    Krishna, Adithya K.
    Ren, Hongliang
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT VII, 2022, 13437 : 33 - 43
  • [22] OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge
    Marino, Kenneth
    Rastegari, Mohammad
    Farhadi, Ali
    Mottaghi, Roozbeh
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3190 - 3199
  • [23] VQA-BC: ROBUST VISUAL QUESTION ANSWERING VIA BIDIRECTIONAL CHAINING
    Lao, Mingrui
    Guo, Yanming
    Chen, Wei
    Pu, Nan
    Lew, Michael S.
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4833 - 4837
  • [24] Learning neighbor-enhanced region representations and question-guided visual representations for visual question answering
    Gao, Ling
    Zhang, Hongda
    Sheng, Nan
    Shi, Lida
    Xu, Hao
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 238
  • [25] Semantic Text Recognition via Visual Question Answering
    Beltran, Viviana
    Journet, Nicholas
    Coustaty, Mickael
    Doucet, Antoine
    2019 INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION WORKSHOPS (ICDARW), VOL 5, 2019, : 97 - 102
  • [26] Cross Modality Bias in Visual Question Answering: A Causal View With Possible Worlds VQA
    Vosoughi, Ali
    Deng, Shijian
    Zhang, Songyang
    Tian, Yapeng
    Xu, Chenliang
    Luo, Jiebo
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 8609 - 8624
  • [27] Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
    Goyal, Yash
    Khot, Tejas
    Summers-Stay, Douglas
    Batra, Dhruv
    Parikh, Devi
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6325 - 6334
  • [28] WSI-VQA: Interpreting Whole Slide Images by Generative Visual Question Answering
    Chen, Pingyi
    Zhu, Chenglu
    Zheng, Sunyi
    Li, Honglin
    Yang, Lin
    COMPUTER VISION - ECCV 2024, PT XXXVI, 2025, 15094 : 401 - 417
  • [29] VQA-PDF: Purifying Debiased Features for Robust Visual Question Answering Task
    Bi, Yandong
    Jiang, Huajie
    Liu, Jing
    Liu, Mengting
    Hu, Yongli
    Yin, Baocai
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XII, ICIC 2024, 2024, 14873 : 264 - 277
  • [30] Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
    Goyal, Yash
    Khot, Tejas
    Agrawal, Aishwarya
    Summers-Stay, Douglas
    Batra, Dhruv
    Parikh, Devi
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2019, 127 (04) : 398 - 414