Visual question answering via Attention-based syntactic structure tree-LSTM

被引：25

作者：

Liu, Yun ^{[1
]}

Zhang, Xiaoming ^{[2
]}

Huang, Feiran ^{[3
]}

Tang, Xianghong ^{[4
]}

Li, Zhoujun ^{[5
]}

机构：

[1] Beihang Univ, Beijing Key Lab Network Technol, Beijing 100191, Peoples R China

[2] Beihang Univ, Sch Cyber Sci & Technol, Beijing 100191, Peoples R China

[3] Jinan Univ, Coll Informat Sci & Technol, Coll Cyber Secur, Guangzhou 510632, Guangdong, Peoples R China

[4] Guizhou Univ, Key Lab Adv Mfg Technol, Minist Educ, Guiyang 550025, Guizhou, Peoples R China

[5] Beihang Univ, Sch Comp Sci & Engn, State Key Lab Software Dev Environm, Beijing 100191, Peoples R China

来源：

APPLIED SOFT COMPUTING | 2019年 / 82卷

基金：

北京市自然科学基金; 中国国家自然科学基金;

关键词：

Visual question answering; Visual attention; Tree-LSTM; Spatial-semantic correlation;

D O I：

10.1016/j.asoc.2019.105584

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Due to the various patterns of the image and free-form language of the question, the performance of Visual Question Answering (VQA) still lags behind satisfaction. Existing approaches mainly infer answers from the low-level features and sequential question words, which neglects the syntactic structure information of the question sentence and its correlation with the spatial structure of the image. To address these problems, we propose a novel VQA model, i.e., Attention-based Syntactic Structure Tree-LSTM (ASST-LSTM). Specifically, a tree-structured LSTM is used to encode the syntactic structure of the question sentence. A spatial-semantic attention model is proposed to learn the visual-textual correlation and the alignment between image regions and question words. In the attention model, Siamese network is employed to explore the alignment between visual and textual contents. Then, the tree-structured LSTM and the spatial-semantic attention model are integrated with a joint deep model, in which the multi-task learning method is used to train the model for answer inferring. Experiments conducted on three widely used VQA benchmark datasets demonstrate the superiority of the proposed model compared with state-of-the-art approaches. (C) 2019 Elsevier B.V. All rights reserved.

引用

页数：12

共 50 条

[21] Question Type Guided Attention in Visual Question Answering
Shi, Yang
Furlanello, Tommaso
Zha, Sheng
Anandkumar, Animashree
COMPUTER VISION - ECCV 2018, PT IV, 2018, 11208 : 158 - 175
[22] Stacked Attention based Textbook Visual Question Answering with BERT
Aishwarya, R.
Sarath, P.
Rahman, Shibil P.
Sneha, U.
Manmadhan, Sruthy
2022 IEEE 19TH INDIA COUNCIL INTERNATIONAL CONFERENCE, INDICON, 2022,
[23] Multi-stage Attention based Visual Question Answering
Mishra, Aakansha
Anand, Ashish
Guha, Prithwijit
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 9407 - 9414
[24] Erasing-based Attention Learning for Visual Question Answering
Liu, Fei
Liu, Jing
Hong, Richang
Lu, Hanqing
PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1175 - 1183
[25] Counting Attention Based on Classification Confidence for Visual Question Answering
Chen, Mingqin
Wang, Yilei
Chen, Shan
Wu, Yingjie
2019 IEEE INTL CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, BIG DATA & CLOUD COMPUTING, SUSTAINABLE COMPUTING & COMMUNICATIONS, SOCIAL COMPUTING & NETWORKING (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2019), 2019, : 1173 - 1179
[26] Visual Question Answering using Explicit Visual Attention
Lioutas, Vasileios
Passalis, Nikolaos
Tefas, Anastasios
2018 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2018,
[27] Attention-based Aspect Reasoning for Knowledge Base Question Answering on Clinical Notes
Wang, Ping
Shi, Tian
Agarwal, Khushbu
Choudhury, Sutanay
Reddy, Chandan K.
13TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND HEALTH INFORMATICS, BCB 2022, 2022,
[28] Attention-based encoder-decoder model for answer selection in question answering
Yuan-ping Nie
Yi Han
Jiu-ming Huang
Bo Jiao
Ai-ping Li
Frontiers of Information Technology & Electronic Engineering, 2017, 18 : 535 - 544
[29] Attention-based encoder-decoder model for answer selection in question answering
Nie, Yuan-ping
Han, Yi
Huang, Jiu-ming
Jiao, Bo
Li, Ai-ping
FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2017, 18 (04) : 535 - 544
[30] Co-attention Network for Visual Question Answering Based on Dual Attention
Dong, Feng
Wang, Xiaofeng
Oad, Ammar
Talpur, Mir Sajjad Hussain
Journal of Engineering Science and Technology Review, 2021, 14 (06) : 116 - 123

← 1 2 3 4 5 →