Visual question answering via Attention-based syntactic structure tree-LSTM

被引:25
|
作者
Liu, Yun [1 ]
Zhang, Xiaoming [2 ]
Huang, Feiran [3 ]
Tang, Xianghong [4 ]
Li, Zhoujun [5 ]
机构
[1] Beihang Univ, Beijing Key Lab Network Technol, Beijing 100191, Peoples R China
[2] Beihang Univ, Sch Cyber Sci & Technol, Beijing 100191, Peoples R China
[3] Jinan Univ, Coll Informat Sci & Technol, Coll Cyber Secur, Guangzhou 510632, Guangdong, Peoples R China
[4] Guizhou Univ, Key Lab Adv Mfg Technol, Minist Educ, Guiyang 550025, Guizhou, Peoples R China
[5] Beihang Univ, Sch Comp Sci & Engn, State Key Lab Software Dev Environm, Beijing 100191, Peoples R China
基金
北京市自然科学基金; 中国国家自然科学基金;
关键词
Visual question answering; Visual attention; Tree-LSTM; Spatial-semantic correlation;
D O I
10.1016/j.asoc.2019.105584
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Due to the various patterns of the image and free-form language of the question, the performance of Visual Question Answering (VQA) still lags behind satisfaction. Existing approaches mainly infer answers from the low-level features and sequential question words, which neglects the syntactic structure information of the question sentence and its correlation with the spatial structure of the image. To address these problems, we propose a novel VQA model, i.e., Attention-based Syntactic Structure Tree-LSTM (ASST-LSTM). Specifically, a tree-structured LSTM is used to encode the syntactic structure of the question sentence. A spatial-semantic attention model is proposed to learn the visual-textual correlation and the alignment between image regions and question words. In the attention model, Siamese network is employed to explore the alignment between visual and textual contents. Then, the tree-structured LSTM and the spatial-semantic attention model are integrated with a joint deep model, in which the multi-task learning method is used to train the model for answer inferring. Experiments conducted on three widely used VQA benchmark datasets demonstrate the superiority of the proposed model compared with state-of-the-art approaches. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Improving Tree-LSTM with Tree Attention
    Ahmed, Mahtab
    Samee, Muhammad Rifayat
    Mercer, Robert E.
    2019 13TH IEEE INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC), 2019, : 247 - 254
  • [2] Question Answering over Knowledgebase with Attention-based LSTM Networks and Knowledge Embedding
    Chen, Lin
    Zeng, Guanping
    Zhang, Qingchuan
    Chen, Xingyu
    Wu, Danfeng
    2017 IEEE 16TH INTERNATIONAL CONFERENCE ON COGNITIVE INFORMATICS & COGNITIVE COMPUTING (ICCI*CC), 2017, : 243 - 246
  • [3] AMAM: An Attention-based Multimodal Alignment Model for Medical Visual Question Answering
    Pan, Haiwei
    He, Shuning
    Zhang, Kejia
    Qu, Bo
    Chen, Chunling
    Shi, Kun
    KNOWLEDGE-BASED SYSTEMS, 2022, 255
  • [4] User's Intention Understanding in Question-Answering System Using Attention-based LSTM
    Matsuyoshi, Yuki
    Takiguchi, Tetsuya
    Ariki, Yasuo
    2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1752 - 1755
  • [5] Attention-based Visual Question Generation
    Patil, Charulata
    Kulkarni, Anagha
    2021 INTERNATIONAL CONFERENCE ON EMERGING SMART COMPUTING AND INFORMATICS (ESCI), 2021, : 82 - 86
  • [6] Cascading Attention Visual Question Answering Model Based on Graph Structure
    Zhang, Haoyu
    Zhang, De
    Computer Engineering and Applications, 2023, 59 (06) : 155 - 161
  • [7] MedFuseNet: An attention-based multimodal deep learning model for visual question answering in the medical domain
    Dhruv Sharma
    Sanjay Purushotham
    Chandan K. Reddy
    Scientific Reports, 11
  • [8] MedFuseNet: An attention-based multimodal deep learning model for visual question answering in the medical domain
    Sharma, Dhruv
    Purushotham, Sanjay
    Reddy, Chandan K.
    SCIENTIFIC REPORTS, 2021, 11 (01)
  • [9] AttenWalker: Unsupervised Long-Document Question Answering via Attention-based Graph Walking
    Nie, Yuxiang
    Huang, Heyan
    Wei, Wei
    Mao, Xian-Ling
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 13650 - 13663
  • [10] An Attention-based Bi-LSTM Method for Visual Object Classification via EEG
    Zheng, Xiao
    Chen, Wanzhong
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2021, 63