Incorporating Verb Semantic Information in Visual Question Answering Through Multitask Learning Paradigm

被引:0
|
作者
Alizadeh, Mehrdad [1 ]
Di Eugenio, Barbara [1 ]
机构
[1] Univ Illinois, Dept Comp Sci, Chicago, IL 60607 USA
关键词
Visual Question Answering; verb semantics; data augmentation; deep learning; multi-task learning;
D O I
10.1142/S1793351X20400085
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual Question Answering (VQA) concerns providing answers to Natural Language questions about images. Several deep neural network approaches have been proposed to model the task in an end-to-end fashion. Whereas the task is grounded in visual processing, if the question focuses on events described by verbs, the language understanding component becomes crucial. Our hypothesis is that models should be aware of verb semantics, as expressed via semantic role labels, argument types, and/or frame elements. Unfortunately, no VQA dataset exists that includes verb semantic information. Our first contribution is a new VQA dataset (imSituVQA) that we built by taking advantage of the imSitu annotations. The imSitu dataset consists of images manually labeled with semantic frame elements, mostly taken from FrameNet. Second, we propose a multi-task CNN-LSTM VQA model that learns to classify the answers as well as the semantic frame elements. Our experiments show that semantic frame element classification helps the VQA system avoid inconsistent responses and improves performance. Third, we employ an automatic semantic role labeler and annotate a subset of the VQA dataset (VQA(sub)). This way, the proposed multi-task CNN-LSTM VQA model can be trained with the VQA(sub) as well. The results show a slight improvement over the single-task CNN-LSTM model.
引用
收藏
页码:223 / 248
页数:26
相关论文
共 50 条
  • [1] Augmenting Visual Question Answering with Semantic Frame Information in a Multitask Learning Approach
    Alizadeh, Mehrdad
    Di Eugenio, Barbara
    2020 IEEE 14TH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC 2020), 2020, : 37 - 44
  • [2] Multitask Learning for Visual Question Answering
    Ma, Jie
    Liu, Jun
    Lin, Qika
    Wu, Bei
    Wang, Yaxian
    You, Yang
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (03) : 1380 - 1394
  • [3] Incorporating 3D Information into Visual Question Answering
    Qiu, Yue
    Satoh, Yutaka
    Suzuki, Ryota
    Kataoka, Hirokatsu
    2019 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2019), 2019, : 756 - 765
  • [4] A Corpus for Visual Question Answering Annotated with Frame Semantic Information
    Alizadeh, Mehrdad
    Di Eugenio, Barbara
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 5524 - 5531
  • [5] Multitask learning for neural generative question answering
    Yanzhou Huang
    Tao Zhong
    Machine Vision and Applications, 2018, 29 : 1009 - 1017
  • [6] Learning visual question answering on controlled semantic noisy labels
    Zhang, Haonan
    Zeng, Pengpeng
    Hu, Yuxuan
    Qian, Jin
    Song, Jingkuan
    Gao, Lianli
    PATTERN RECOGNITION, 2023, 138
  • [7] Multitask learning for neural generative question answering
    Huang, Yanzhou
    Zhong, Tao
    MACHINE VISION AND APPLICATIONS, 2018, 29 (06) : 1009 - 1017
  • [8] Incorporating Domain Knowledge and Semantic Information into Language Models for Commonsense Question Answering
    Zhou, Ruiying
    Tian, Keke
    Lai, Hanjiang
    Yin, Jian
    PROCEEDINGS OF THE 2021 IEEE 24TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN (CSCWD), 2021, : 1160 - 1165
  • [9] Improving Visual Question Answering by Semantic Segmentation
    Pham, Viet-Quoc
    Mishima, Nao
    Nakasu, Toshiaki
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT III, 2021, 12893 : 459 - 470
  • [10] STRUCTURED SEMANTIC REPRESENTATION FOR VISUAL QUESTION ANSWERING
    Yu, Dongchen
    Gao, Xing
    Xiong, Hongkai
    2018 25TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2018, : 2286 - 2290