Visual question answering via Attention-based syntactic structure tree-LSTM

被引：25

作者：

Liu, Yun ^{[1
]}

Zhang, Xiaoming ^{[2
]}

Huang, Feiran ^{[3
]}

Tang, Xianghong ^{[4
]}

Li, Zhoujun ^{[5
]}

机构：

[1] Beihang Univ, Beijing Key Lab Network Technol, Beijing 100191, Peoples R China

[2] Beihang Univ, Sch Cyber Sci & Technol, Beijing 100191, Peoples R China

[3] Jinan Univ, Coll Informat Sci & Technol, Coll Cyber Secur, Guangzhou 510632, Guangdong, Peoples R China

[4] Guizhou Univ, Key Lab Adv Mfg Technol, Minist Educ, Guiyang 550025, Guizhou, Peoples R China

[5] Beihang Univ, Sch Comp Sci & Engn, State Key Lab Software Dev Environm, Beijing 100191, Peoples R China

来源：

APPLIED SOFT COMPUTING | 2019年 / 82卷

基金：

北京市自然科学基金; 中国国家自然科学基金;

关键词：

Visual question answering; Visual attention; Tree-LSTM; Spatial-semantic correlation;

D O I：

10.1016/j.asoc.2019.105584

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Due to the various patterns of the image and free-form language of the question, the performance of Visual Question Answering (VQA) still lags behind satisfaction. Existing approaches mainly infer answers from the low-level features and sequential question words, which neglects the syntactic structure information of the question sentence and its correlation with the spatial structure of the image. To address these problems, we propose a novel VQA model, i.e., Attention-based Syntactic Structure Tree-LSTM (ASST-LSTM). Specifically, a tree-structured LSTM is used to encode the syntactic structure of the question sentence. A spatial-semantic attention model is proposed to learn the visual-textual correlation and the alignment between image regions and question words. In the attention model, Siamese network is employed to explore the alignment between visual and textual contents. Then, the tree-structured LSTM and the spatial-semantic attention model are integrated with a joint deep model, in which the multi-task learning method is used to train the model for answer inferring. Experiments conducted on three widely used VQA benchmark datasets demonstrate the superiority of the proposed model compared with state-of-the-art approaches. (C) 2019 Elsevier B.V. All rights reserved.

引用

页数：12

共 50 条

[1] Improving Tree-LSTM with Tree Attention
Ahmed, Mahtab
Samee, Muhammad Rifayat
Mercer, Robert E.
2019 13TH IEEE INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC), 2019, : 247 - 254
[2] Question Answering over Knowledgebase with Attention-based LSTM Networks and Knowledge Embedding
Chen, Lin
Zeng, Guanping
Zhang, Qingchuan
Chen, Xingyu
Wu, Danfeng
2017 IEEE 16TH INTERNATIONAL CONFERENCE ON COGNITIVE INFORMATICS & COGNITIVE COMPUTING (ICCI*CC), 2017, : 243 - 246
[3] AMAM: An Attention-based Multimodal Alignment Model for Medical Visual Question Answering
Pan, Haiwei
He, Shuning
Zhang, Kejia
Qu, Bo
Chen, Chunling
Shi, Kun
KNOWLEDGE-BASED SYSTEMS, 2022, 255
[4] User's Intention Understanding in Question-Answering System Using Attention-based LSTM
Matsuyoshi, Yuki
Takiguchi, Tetsuya
Ariki, Yasuo
2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1752 - 1755
[5] Attention-based Visual Question Generation
Patil, Charulata
Kulkarni, Anagha
2021 INTERNATIONAL CONFERENCE ON EMERGING SMART COMPUTING AND INFORMATICS (ESCI), 2021, : 82 - 86
[6] Cascading Attention Visual Question Answering Model Based on Graph Structure
Zhang, Haoyu
Zhang, De
Computer Engineering and Applications, 2023, 59 (06) : 155 - 161
[7] MedFuseNet: An attention-based multimodal deep learning model for visual question answering in the medical domain
Dhruv Sharma
Sanjay Purushotham
Chandan K. Reddy
Scientific Reports, 11
[8] MedFuseNet: An attention-based multimodal deep learning model for visual question answering in the medical domain
Sharma, Dhruv
Purushotham, Sanjay
Reddy, Chandan K.
SCIENTIFIC REPORTS, 2021, 11 (01)
[9] AttenWalker: Unsupervised Long-Document Question Answering via Attention-based Graph Walking
Nie, Yuxiang
Huang, Heyan
Wei, Wei
Mao, Xian-Ling
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 13650 - 13663
[10] An Attention-based Bi-LSTM Method for Visual Object Classification via EEG
Zheng, Xiao
Chen, Wanzhong
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2021, 63

← 1 2 3 4 5 →