Syntax Tree Constrained Graph Network for Visual Question Answering

被引:0
|
作者
Su, Xiangrui [1 ]
Zhang, Qi [2 ,3 ]
Shi, Chongyang [1 ]
Liu, Jiachang [1 ]
Hu, Liang [2 ,3 ]
机构
[1] Beijing Inst Technol, Beijing, Peoples R China
[2] Tongji Univ, Shanghai, Peoples R China
[3] DeepBlue Acad Sci, Shanghai, Peoples R China
关键词
Visual question answering; Syntax tree; Message passing; Tree convolution; Graph neural network;
D O I
10.1007/978-981-99-8073-4_10
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual Question Answering (VQA) aims to automatically answer natural language questions related to given image content. Existing VQA methods integrate vision modeling and language understanding to explore the deep semantics of the question. However, these methods ignore the significant syntax information of the question, which plays a vital role in understanding the essential semantics of the question and guiding the visual feature refinement. To fill the gap, we suggested a novel Syntax Tree Constrained Graph Network (STCGN) for VQA based on entity message passing and syntax tree. This model is able to extract a syntax tree from questions and obtain more precise syntax information. Specifically, we parse questions and obtain the question syntax tree using the Stanford syntax parsing tool. From the word level and phrase level, syntactic phrase features and question features are extracted using a hierarchical tree convolutional network. We then design a message-passing mechanism for phrase-aware visual entities and capture entity features according to a given visual context. Extensive experiments on VQA2.0 datasets demonstrate the superiority of our proposed model.
引用
收藏
页码:122 / 136
页数:15
相关论文
共 50 条
  • [31] ADAPTIVE ATTENTION FUSION NETWORK FOR VISUAL QUESTION ANSWERING
    Gu, Geonmo
    Kim, Seong Tae
    Ro, Yong Man
    2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 997 - 1002
  • [32] Triple attention network for sentimental visual question answering
    Ruwa, Nelson
    Mao, Qirong
    Song, Heping
    Jia, Hongjie
    Dong, Ming
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2019, 189
  • [33] Collaborative Attention Network to Enhance Visual Question Answering
    Gu, Rui
    BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2019, 124 : 304 - 305
  • [34] Graph-enhanced visual representations and question-guided dual attention for visual question answering
    Yusuf, Abdulganiyu Abdu
    Feng, Chong
    Mao, Xianling
    Haruna, Yunusa
    Li, Xinyan
    Duma, Ramadhani Ally
    NEUROCOMPUTING, 2025, 614
  • [35] Fair Attention Network for Robust Visual Question Answering
    Bi, Yandong
    Jiang, Huajie
    Hu, Yongli
    Sun, Yanfeng
    Yin, Baocai
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (09) : 7870 - 7881
  • [36] Visual explainable artificial intelligence for graph-based visual question answering and scene graph curation
    Sebastian Künzel
    Tanja Munz-Körner
    Pascal Tilli
    Noel Schäfer
    Sandeep Vidyapu
    Ngoc Thang Vu
    Daniel Weiskopf
    Visual Computing for Industry, Biomedicine, and Art, 8 (1)
  • [37] Visual Question Answering
    Nada, Ahmed
    Chen, Min
    2024 INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKING AND COMMUNICATIONS, ICNC, 2024, : 6 - 10
  • [38] Visual-Textual Semantic Alignment Network for Visual Question Answering
    Tian, Weidong
    Zhang, Yuzheng
    He, Bin
    Zhu, Junjun
    Zhao, Zhongqiu
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2021, PT V, 2021, 12895 : 259 - 270
  • [39] Visual-Semantic Dual Channel Network for Visual Question Answering
    Wang, Xin
    Chen, Qiaohong
    Hu, Ting
    Sun, Qi
    Jia, Yubo
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [40] Hierarchical Graph Network for Multi-hop Question Answering
    Fang, Yuwei
    Sun, Siqi
    Gan, Zhe
    Pillai, Rohit
    Wang, Shuohang
    Liu, Jingjing
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 8823 - 8838