PRIOR VISUAL RELATIONSHIP REASONING FOR VISUAL QUESTION ANSWERING

被引:0
|
作者
Yang, Zhuoqian [1 ,2 ]
Qin, Zengchang [2 ,3 ]
Yu, Jing [4 ]
Wan, Tao [5 ]
机构
[1] Carnegie Mellon Univ, Inst Robot, Pittsburgh, PA 15213 USA
[2] Beihang Univ, Sch ASEE, Intelligent Comp & Machine Learning Lab, Beijing, Peoples R China
[3] Codemao, AI Res, Shenzhen, Peoples R China
[4] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China
[5] Beihang Univ, Sch Biol Sci & Med Engn, Beijing, Peoples R China
关键词
VQA; GCN; Attention Mechanism;
D O I
暂无
中图分类号
TB8 [摄影技术];
学科分类号
0804 ;
摘要
Visual Question Answering (VQA) is a representative task of cross-modal reasoning where an image and a free-form question in natural language are presented and the correct answer needs to be determined using both visual and textual information. One of the key issues of VQA is to reason with semantic clues in the visual content under the guidance of the question. In this paper, we propose Scene Graph Convolutional Network (SceneGCN) to jointly reason the object properties and their semantic relations for the correct answer. The visual relationship is projected into a deep learned semantic space constrained by visual context and language priors. Based on comprehensive experiments on two challenging datasets: GQA and VQA 2.0, we demonstrate the effectiveness and interpretability of the new model.
引用
收藏
页码:1411 / 1415
页数:5
相关论文
共 50 条
  • [41] A Spatial Hierarchical Reasoning Network for Remote Sensing Visual Question Answering
    Zhang, Zixiao
    Jiao, Licheng
    Li, Lingling
    Liu, Xu
    Chen, Puhua
    Liu, Fang
    Li, Yuxuan
    Guo, Zhicheng
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [42] Cross-modal Relational Reasoning Network for Visual Question Answering
    Chen, Hongyu
    Liu, Ruifang
    Peng, Bo
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 3939 - 3948
  • [43] Cascade Reasoning Network for Text-based Visual Question Answering
    Liu, Fen
    Xu, Guanghui
    Wu, Qi
    Du, Qing
    Jia, Wei
    Tan, Mingkui
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 4060 - 4069
  • [44] Medical Visual Question Answering via Conditional Reasoning and Contrastive Learning
    Liu, Bo
    Zhan, Li-Ming
    Xu, Li
    Wu, Xiao-Ming
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2023, 42 (05) : 1532 - 1545
  • [45] Hierarchical Multimodality Graph Reasoning for Remote Sensing Visual Question Answering
    Zhang, Han
    Wang, Keming
    Zhang, Laixian
    Wang, Bingshu
    Li, Xuelong
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [46] DisAVR: Disentangled Adaptive Visual Reasoning Network for Diagram Question Answering
    Wang, Yaxian
    Wei, Bifan
    Liu, Jun
    Zhang, Lingling
    Wang, Jiaxin
    Wang, Qianying
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 4812 - 4827
  • [47] Hierarchical reasoning based on perception action cycle for visual question answering
    Mohamud, Safaa Abdullahi Moallim
    Jalali, Amin
    Lee, Minho
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 241
  • [48] Visual question answering method based on relational reasoning and gating mechanism
    Wang X.
    Chen Q.-H.
    Sun Q.
    Jia Y.-B.
    Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2022, 56 (01): : 36 - 46
  • [49] Learning Hierarchical Reasoning for Text-Based Visual Question Answering
    Li, Caiyuan
    Du, Qinyi
    Wang, Qingqing
    Jin, Yaohui
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT III, 2021, 12893 : 305 - 316
  • [50] VQA: Visual Question Answering
    Antol, Stanislaw
    Agrawal, Aishwarya
    Lu, Jiasen
    Mitchell, Margaret
    Batra, Dhruv
    Zitnick, C. Lawrence
    Parikh, Devi
    2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 2425 - 2433