Syntax Tree Constrained Graph Network for Visual Question Answering

被引：0

作者：

Su, Xiangrui ^{[1
]}

Zhang, Qi ^{[2
,3
]}

Shi, Chongyang ^{[1
]}

Liu, Jiachang ^{[1
]}

Hu, Liang ^{[2
,3
]}

机构：

[1] Beijing Inst Technol, Beijing, Peoples R China

[2] Tongji Univ, Shanghai, Peoples R China

[3] DeepBlue Acad Sci, Shanghai, Peoples R China

来源：

NEURAL INFORMATION PROCESSING, ICONIP 2023, PT V | 2024年 / 14451卷

关键词：

Visual question answering; Syntax tree; Message passing; Tree convolution; Graph neural network;

D O I：

10.1007/978-981-99-8073-4_10

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Visual Question Answering (VQA) aims to automatically answer natural language questions related to given image content. Existing VQA methods integrate vision modeling and language understanding to explore the deep semantics of the question. However, these methods ignore the significant syntax information of the question, which plays a vital role in understanding the essential semantics of the question and guiding the visual feature refinement. To fill the gap, we suggested a novel Syntax Tree Constrained Graph Network (STCGN) for VQA based on entity message passing and syntax tree. This model is able to extract a syntax tree from questions and obtain more precise syntax information. Specifically, we parse questions and obtain the question syntax tree using the Stanford syntax parsing tool. From the word level and phrase level, syntactic phrase features and question features are extracted using a hierarchical tree convolutional network. We then design a message-passing mechanism for phrase-aware visual entities and capture entity features according to a given visual context. Extensive experiments on VQA2.0 datasets demonstrate the superiority of our proposed model.

引用

页码：122 / 136

页数：15

共 50 条

[31] ADAPTIVE ATTENTION FUSION NETWORK FOR VISUAL QUESTION ANSWERING
Gu, Geonmo
Kim, Seong Tae
Ro, Yong Man
2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 997 - 1002
[32] Triple attention network for sentimental visual question answering
Ruwa, Nelson
Mao, Qirong
Song, Heping
Jia, Hongjie
Dong, Ming
COMPUTER VISION AND IMAGE UNDERSTANDING, 2019, 189
[33] Collaborative Attention Network to Enhance Visual Question Answering
Gu, Rui
BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2019, 124 : 304 - 305
[34] Graph-enhanced visual representations and question-guided dual attention for visual question answering
Yusuf, Abdulganiyu Abdu
Feng, Chong
Mao, Xianling
Haruna, Yunusa
Li, Xinyan
Duma, Ramadhani Ally
NEUROCOMPUTING, 2025, 614
[35] Fair Attention Network for Robust Visual Question Answering
Bi, Yandong
Jiang, Huajie
Hu, Yongli
Sun, Yanfeng
Yin, Baocai
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (09) : 7870 - 7881
[36] Visual explainable artificial intelligence for graph-based visual question answering and scene graph curation
Sebastian Künzel
Tanja Munz-Körner
Pascal Tilli
Noel Schäfer
Sandeep Vidyapu
Ngoc Thang Vu
Daniel Weiskopf
Visual Computing for Industry, Biomedicine, and Art, 8 (1)
[37] Visual Question Answering
Nada, Ahmed
Chen, Min
2024 INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKING AND COMMUNICATIONS, ICNC, 2024, : 6 - 10
[38] Visual-Textual Semantic Alignment Network for Visual Question Answering
Tian, Weidong
Zhang, Yuzheng
He, Bin
Zhu, Junjun
Zhao, Zhongqiu
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2021, PT V, 2021, 12895 : 259 - 270
[39] Visual-Semantic Dual Channel Network for Visual Question Answering
Wang, Xin
Chen, Qiaohong
Hu, Ting
Sun, Qi
Jia, Yubo
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[40] Hierarchical Graph Network for Multi-hop Question Answering
Fang, Yuwei
Sun, Siqi
Gan, Zhe
Pillai, Rohit
Wang, Shuohang
Liu, Jingjing
PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 8823 - 8838

← 1 2 3 4 5 →