Dynamic Memory Networks for Visual and Textual Question Answering

被引:0
|
作者
Xiong, Caiming [1 ]
Merity, Stephen [1 ]
Socher, Richard [1 ]
机构
[1] Salesforce Inc, San Francisco, CA 94105 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Neural network architectures with memory and attention mechanisms exhibit certain reasoning capabilities required for question answering. One such architecture, the dynamic memory network (DMN), obtained high accuracy on a variety of language tasks. However, it was not shown whether the architecture achieves strong results for question answering when supporting facts are not marked during training or whether it could be applied to other modalities such as images Based on an analysis of the DMN, we propose several improvements to its memory and input modules. Together with these changes we introduce a novel input module for images in order to be able to answer visual questions. Our new DMN+ model improves the state of the art on both the Visual Question Answering dataset and the bAbI-10k text question-answering dataset without supporting fact supervision.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] VQA: Visual Question Answering
    Agrawal, Aishwarya
    Lu, Jiasen
    Antol, Stanislaw
    Mitchell, Margaret
    Zitnick, C. Lawrence
    Parikh, Devi
    Batra, Dhruv
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2017, 123 (01) : 4 - 31
  • [42] Survey on Visual Question Answering
    Bao X.-G.
    Zhou C.-L.
    Xiao K.-J.
    Qin B.
    Ruan Jian Xue Bao/Journal of Software, 2021, 32 (08): : 2522 - 2544
  • [43] Visual Question Answering A tutorial
    Teney, Damien
    Wu, Qi
    van den Hengel, Anton
    IEEE SIGNAL PROCESSING MAGAZINE, 2017, 34 (06) : 63 - 75
  • [44] Dual Self-Guided Attention with Sparse Question Networks for Visual Question Answering
    Shen, Xiang
    Han, Dezhi
    Chang, Chin-Chen
    Zong, Liang
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2022, E105D (04) : 785 - 796
  • [45] Fusing Visual and Textual Representations via Multi-layer Fusing Transformers for Vietnamese Visual Question Answering
    Cong Phu Nguyen
    Huy Tien Nguyen
    Tung Le
    ADVANCES IN COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2024, PT II, 2024, 2166 : 185 - 196
  • [46] Optimal Question Answering Routing in Dynamic Online Social Networks
    Ali, Imad
    Chang, Ronald Y.
    Chuang, Jo-Chi
    Hsu, Cheng-Hsin
    Yetis, Cenk M.
    2017 IEEE 86TH VEHICULAR TECHNOLOGY CONFERENCE (VTC-FALL), 2017,
  • [47] Visual Question Generation as Dual Task of Visual Question Answering
    Li, Yikang
    Duan, Nan
    Zhou, Bolei
    Chu, Xiao
    Ouyang, Wanli
    Wang, Xiaogang
    Zhou, Ming
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6116 - 6124
  • [48] Episodic Memory Question Answering
    Datta, Samyak
    Dharur, Sameer
    Cartillier, Vincent
    Desai, Ruta
    Khanna, Mukul
    Batra, Dhruv
    Parikh, Devi
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 19097 - 19106
  • [49] An analysis of graph convolutional networks and recent datasets for visual question answering
    Yusuf, Abdulganiyu Abdu
    Feng Chong
    Mao Xianling
    ARTIFICIAL INTELLIGENCE REVIEW, 2022, 55 (08) : 6277 - 6300
  • [50] Hierarchical Attention Networks for Fact-based Visual Question Answering
    Haibo Yao
    Yongkang Luo
    Zhi Zhang
    Jianhang Yang
    Chengtao Cai
    Multimedia Tools and Applications, 2024, 83 : 17281 - 17298