Incorporating Verb Semantic Information in Visual Question Answering Through Multitask Learning Paradigm

被引:0
|
作者
Alizadeh, Mehrdad [1 ]
Di Eugenio, Barbara [1 ]
机构
[1] Univ Illinois, Dept Comp Sci, Chicago, IL 60607 USA
关键词
Visual Question Answering; verb semantics; data augmentation; deep learning; multi-task learning;
D O I
10.1142/S1793351X20400085
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual Question Answering (VQA) concerns providing answers to Natural Language questions about images. Several deep neural network approaches have been proposed to model the task in an end-to-end fashion. Whereas the task is grounded in visual processing, if the question focuses on events described by verbs, the language understanding component becomes crucial. Our hypothesis is that models should be aware of verb semantics, as expressed via semantic role labels, argument types, and/or frame elements. Unfortunately, no VQA dataset exists that includes verb semantic information. Our first contribution is a new VQA dataset (imSituVQA) that we built by taking advantage of the imSitu annotations. The imSitu dataset consists of images manually labeled with semantic frame elements, mostly taken from FrameNet. Second, we propose a multi-task CNN-LSTM VQA model that learns to classify the answers as well as the semantic frame elements. Our experiments show that semantic frame element classification helps the VQA system avoid inconsistent responses and improves performance. Third, we employ an automatic semantic role labeler and annotate a subset of the VQA dataset (VQA(sub)). This way, the proposed multi-task CNN-LSTM VQA model can be trained with the VQA(sub) as well. The results show a slight improvement over the single-task CNN-LSTM model.
引用
收藏
页码:223 / 248
页数:26
相关论文
共 50 条
  • [31] An Effective Deep Transfer Learning and Information Fusion Framework for Medical Visual Question Answering
    Liu, Feifan
    Peng, Yalei
    Rosen, Max P.
    EXPERIMENTAL IR MEETS MULTILINGUALITY, MULTIMODALITY, AND INTERACTION (CLEF 2019), 2019, 11696 : 238 - 247
  • [32] Information Retrieval Through the Web and Semantic Knowledge-Driven Automatic Question Answering System
    Sheshasaayee, Ananthi
    Jayalakshmi, S.
    PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON FRONTIERS IN INTELLIGENT COMPUTING: THEORY AND APPLICATIONS, (FICTA 2016), VOL 2, 2017, 516 : 621 - 632
  • [33] Visual Question Answering Research on Joint Knowledge and Visual Information Reasoning
    Su, Zhenqiang
    Gou, Gang
    Computer Engineering and Applications, 2024, 60 (05) : 95 - 102
  • [34] MMQL: Multi-Question Learning for Medical Visual Question Answering
    Chen, Qishen
    Bian, Minjie
    Xu, Huahu
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT V, 2024, 15005 : 480 - 489
  • [35] Bidirectional Contrastive Split Learning for Visual Question Answering
    Sun, Yuwei
    Ochiai, Hideya
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 19, 2024, : 21602 - 21609
  • [36] Learning to Specialize with Knowledge Distillation for Visual Question Answering
    Mun, Jonghwan
    Lee, Kimin
    Shin, Jinwoo
    Han, Bohyung
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [37] Bridging the Cross-Modality Semantic Gap in Visual Question Answering
    Wang, Boyue
    Ma, Yujian
    Li, Xiaoyan
    Gao, Junbin
    Hu, Yongli
    Yin, Baocai
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (03) : 4519 - 4531
  • [38] Measuring Machine Intelligence Through Visual Question Answering
    Zitnick, C. Lawrence
    Agrawal, Aishwarya
    Antol, Stanislaw
    Mitchell, Margaret
    Batra, Dhruv
    Parikh, Devi
    AI MAGAZINE, 2016, 37 (01) : 63 - 72
  • [39] Multiple Context Learning Networks for Visual Question Answering
    Zhang, Pufen
    Lan, Hong
    Khan, Muhammad Asim
    SCIENTIFIC PROGRAMMING, 2022, 2022
  • [40] Adversarial Learning with Bidirectional Attention for Visual Question Answering
    Li, Qifeng
    Tang, Xinyi
    Jian, Yi
    SENSORS, 2021, 21 (21)