Exploring Models and Data for Image Question Answering

被引:0
|
作者
Ren, Mengye [1 ]
Kiros, Ryan [1 ]
Zemel, Richard S. [1 ,2 ]
机构
[1] Univ Toronto, Toronto, ON, Canada
[2] Canadian Inst Adv Res, Quebec City, PQ, Canada
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This work aims to address the problem of image-based question-answering (QA) with new models and datasets. In our work, we propose to use neural networks and visual semantic embeddings, without intermediate stages such as object detection and image segmentation, to predict answers to simple questions about images. Our model performs 1.8 times better than the only published results on an existing image QA dataset. We also present a question generation algorithm that converts image descriptions, which are widely available, into QA form. We used this algorithm to produce an order-of-magnitude larger dataset, with more evenly distributed answers. A suite of baseline results on this new dataset are also presented.
引用
收藏
页数:9
相关论文
共 50 条
  • [21] Exploring Automated Question Answering Methods for Teaching Assistance
    Zylich, Brian
    Viola, Adam
    Toggerson, Brokk
    Al-Hariri, Lara
    Lan, Andrew
    ARTIFICIAL INTELLIGENCE IN EDUCATION (AIED 2020), PT I, 2020, 12163 : 610 - 622
  • [22] The meaning of "most" for visual question answering models
    Kuhnle, Alexander
    Copestake, Ann
    BLACKBOXNLP WORKSHOP ON ANALYZING AND INTERPRETING NEURAL NETWORKS FOR NLP AT ACL 2019, 2019, : 46 - 55
  • [23] Video Question Answering: a Survey of Models and Datasets
    Guanglu Sun
    Lili Liang
    Tianlin Li
    Bo Yu
    Meng Wu
    Bolun Zhang
    Mobile Networks and Applications, 2021, 26 : 1904 - 1937
  • [24] Latent Variable Models for Visual Question Answering
    Wang, Zixu
    Miao, Yishu
    Specia, Lucia
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 3137 - 3141
  • [25] Video Question Answering: a Survey of Models and Datasets
    Sun, Guanglu
    Liang, Lili
    Li, Tianlin
    Yu, Bo
    Wu, Meng
    Zhang, Bolun
    MOBILE NETWORKS & APPLICATIONS, 2021, 26 (05): : 1904 - 1937
  • [26] Finetuning Language Models for Multimodal Question Answering
    Zhang, Xin
    Xie, Wen
    Dai, Ziqi
    Rao, Jun
    Wen, Haokun
    Luo, Xuan
    Zhang, Meishan
    Zhang, Min
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 9420 - 9424
  • [27] ConfigILM : A general purpose configurable library for combining image and language models for visual question answering
    Hackel, Leonard
    Clasen, Kai Norman
    Demir, Beguem
    SOFTWAREX, 2024, 26
  • [28] Image captioning for effective use of language models in knowledge-based visual question answering
    Salaberria, Ander
    Azkune, Gorka
    Lacalle, Oier Lopez de
    Soroa, Aitor
    Agirre, Eneko
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 212
  • [29] Incorporation of question segregation procedures in visual question-answering models
    Chowdhury, Souvik
    Soni, Badal
    Phukan, Doli
    INTERNATIONAL JOURNAL OF COMPUTING SCIENCE AND MATHEMATICS, 2024, 20 (02) : 99 - 108
  • [30] A visual question answering model based on image captioning
    Zhou, Kun
    Liu, Qiongjie
    Zhao, Dexin
    MULTIMEDIA SYSTEMS, 2024, 30 (06)