Syntax Tree Constrained Graph Network for Visual Question Answering

被引:0
|
作者
Su, Xiangrui [1 ]
Zhang, Qi [2 ,3 ]
Shi, Chongyang [1 ]
Liu, Jiachang [1 ]
Hu, Liang [2 ,3 ]
机构
[1] Beijing Inst Technol, Beijing, Peoples R China
[2] Tongji Univ, Shanghai, Peoples R China
[3] DeepBlue Acad Sci, Shanghai, Peoples R China
关键词
Visual question answering; Syntax tree; Message passing; Tree convolution; Graph neural network;
D O I
10.1007/978-981-99-8073-4_10
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual Question Answering (VQA) aims to automatically answer natural language questions related to given image content. Existing VQA methods integrate vision modeling and language understanding to explore the deep semantics of the question. However, these methods ignore the significant syntax information of the question, which plays a vital role in understanding the essential semantics of the question and guiding the visual feature refinement. To fill the gap, we suggested a novel Syntax Tree Constrained Graph Network (STCGN) for VQA based on entity message passing and syntax tree. This model is able to extract a syntax tree from questions and obtain more precise syntax information. Specifically, we parse questions and obtain the question syntax tree using the Stanford syntax parsing tool. From the word level and phrase level, syntactic phrase features and question features are extracted using a hierarchical tree convolutional network. We then design a message-passing mechanism for phrase-aware visual entities and capture entity features according to a given visual context. Extensive experiments on VQA2.0 datasets demonstrate the superiority of our proposed model.
引用
收藏
页码:122 / 136
页数:15
相关论文
共 50 条
  • [1] Integrating Syntax Tree and Graph Neural Network for Conversational Question Answering over Heterogeneous Sources
    Li, Meiwen
    Cai, Tianyu
    Wu, Lingyan
    Chen, Li
    Ju, Shenggen
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT I, NLPCC 2024, 2025, 15359 : 83 - 96
  • [2] Scene Graph Refinement Network for Visual Question Answering
    Qian, Tianwen
    Chen, Jingjing
    Chen, Shaoxiang
    Wu, Bo
    Jiang, Yu-Gang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 3950 - 3961
  • [3] Semantic Relation Graph Reasoning Network for Visual Question Answering
    Lan, Hong
    Zhang, Pufen
    TWELFTH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING SYSTEMS, 2021, 11719
  • [4] Co-attention graph convolutional network for visual question answering
    Liu, Chuan
    Tan, Ying-Ying
    Xia, Tian-Tian
    Zhang, Jiajing
    Zhu, Ming
    MULTIMEDIA SYSTEMS, 2023, 29 (05) : 2527 - 2543
  • [5] Co-attention graph convolutional network for visual question answering
    Chuan Liu
    Ying-Ying Tan
    Tian-Tian Xia
    Jiajing Zhang
    Ming Zhu
    Multimedia Systems, 2023, 29 : 2527 - 2543
  • [6] Relation-Aware Graph Attention Network for Visual Question Answering
    Li, Linjie
    Gan, Zhe
    Cheng, Yu
    Liu, Jingjing
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 10312 - 10321
  • [7] Heterogeneous Interactive Graph Network for Audio-Visual Question Answering
    Zhao, Yihan
    Xi, Wei
    Bai, Gairui
    Liu, Xinhui
    Zhao, Jizhong
    KNOWLEDGE-BASED SYSTEMS, 2024, 300
  • [8] Barlow constrained optimization for Visual Question Answering
    Jha, Abhishek
    Patro, Badri
    Van Gool, Luc
    Tuytelaars, Tinne
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 1084 - 1093
  • [9] Syntax-Informed Question Answering with Heterogeneous Graph Transformer
    Zhu, Fangyi
    Tan, Lok You
    Ng, See-Kiong
    Bressan, Stephane
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, DEXA 2022, PT I, 2022, 13426 : 17 - 31
  • [10] Bilinear Graph Networks for Visual Question Answering
    Guo, Dalu
    Xu, Chang
    Tao, Dacheng
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (02) : 1023 - 1034