Incorporating Verb Semantic Information in Visual Question Answering Through Multitask Learning Paradigm

被引:0
|
作者
Alizadeh, Mehrdad [1 ]
Di Eugenio, Barbara [1 ]
机构
[1] Univ Illinois, Dept Comp Sci, Chicago, IL 60607 USA
关键词
Visual Question Answering; verb semantics; data augmentation; deep learning; multi-task learning;
D O I
10.1142/S1793351X20400085
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual Question Answering (VQA) concerns providing answers to Natural Language questions about images. Several deep neural network approaches have been proposed to model the task in an end-to-end fashion. Whereas the task is grounded in visual processing, if the question focuses on events described by verbs, the language understanding component becomes crucial. Our hypothesis is that models should be aware of verb semantics, as expressed via semantic role labels, argument types, and/or frame elements. Unfortunately, no VQA dataset exists that includes verb semantic information. Our first contribution is a new VQA dataset (imSituVQA) that we built by taking advantage of the imSitu annotations. The imSitu dataset consists of images manually labeled with semantic frame elements, mostly taken from FrameNet. Second, we propose a multi-task CNN-LSTM VQA model that learns to classify the answers as well as the semantic frame elements. Our experiments show that semantic frame element classification helps the VQA system avoid inconsistent responses and improves performance. Third, we employ an automatic semantic role labeler and annotate a subset of the VQA dataset (VQA(sub)). This way, the proposed multi-task CNN-LSTM VQA model can be trained with the VQA(sub) as well. The results show a slight improvement over the single-task CNN-LSTM model.
引用
收藏
页码:223 / 248
页数:26
相关论文
共 50 条
  • [41] Bridging the Cross-Modality Semantic Gap in Visual Question Answering
    Wang, Boyue
    Ma, Yujian
    Li, Xiaoyan
    Gao, Junbin
    Hu, Yongli
    Yin, Baocai
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, : 1 - 13
  • [42] Learning Visual Question Answering by Bootstrapping Hard Attention
    Malinowski, Mateusz
    Doersch, Carl
    Santoro, Adam
    Battaglia, Peter
    COMPUTER VISION - ECCV 2018, PT VI, 2018, 11210 : 3 - 20
  • [43] Language-aware Visual Semantic Distillation for Video Question Answering
    Zou, Bo
    Yang, Chao
    Qiao, Yu
    Quan, Chengbin
    Zhao, Youjian
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 27103 - 27113
  • [44] Robust visual question answering via semantic cross modal augmentation
    Mashrur, Akib
    Luo, Wei
    Zaidi, Nayyar A.
    Robles-Kelly, Antonio
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 238
  • [45] Semantic-Aware Modular Capsule Routing for Visual Question Answering
    Han, Yudong
    Yin, Jianhua
    Wu, Jianlong
    Wei, Yinwei
    Nie, Liqiang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 5537 - 5549
  • [46] Focal and Composed Vision-semantic Modeling for Visual Question Answering
    Han, Yudong
    Guo, Yangyang
    Yin, Jianhua
    Liu, Meng
    Hu, Yupeng
    Nie, Liqiang
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4528 - 4536
  • [47] SA-VQA: Structured Alignment of Visual and Semantic Representations for Visual Question Answering
    Xiong, Peixi
    You, Quanzeng
    Yu, Pei
    Liu, Zicheng
    Wu, Ying
    arXiv, 2022,
  • [48] Question Answering Biographic Information and Social Networks Powered by the Semantic Web
    Adolphs, Peter
    Cheng, Xiwen
    Kluewer, Tina
    Uszkoreit, Hans
    Xu, Feiyu
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010,
  • [49] Joint Multitask Learning for Community Question Answering Using Task-Specific Embeddings
    Joty, Shafiq
    Marquez, Lluis
    Nakov, Preslav
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 4196 - 4207
  • [50] Supervised Transfer Learning for Product Information Question Answering
    Tuan Manh Lai
    Trung Bui
    Lipka, Nedim
    Li, Sheng
    2018 17TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2018, : 1109 - 1114