Exploring Models and Data for Image Question Answering

被引:0
|
作者
Ren, Mengye [1 ]
Kiros, Ryan [1 ]
Zemel, Richard S. [1 ,2 ]
机构
[1] Univ Toronto, Toronto, ON, Canada
[2] Canadian Inst Adv Res, Quebec City, PQ, Canada
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This work aims to address the problem of image-based question-answering (QA) with new models and datasets. In our work, we propose to use neural networks and visual semantic embeddings, without intermediate stages such as object detection and image segmentation, to predict answers to simple questions about images. Our model performs 1.8 times better than the only published results on an existing image QA dataset. We also present a question generation algorithm that converts image descriptions, which are widely available, into QA form. We used this algorithm to produce an order-of-magnitude larger dataset, with more evenly distributed answers. A suite of baseline results on this new dataset are also presented.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] MoReVQA: Exploring Modular Reasoning Models for Video Question Answering
    Min, Juhong
    Buchl, Shyamal
    Nagrani, Arsha
    Cho, Minsu
    Schm, Cordelia
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 13235 - 13245
  • [2] Training Question Answering Models From Synthetic Data
    Puri, Raul
    Spring, Ryan
    Shoeybi, Mohammad
    Patwary, Mostofa
    Catanzaro, Bryan
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 5811 - 5826
  • [4] Exploring Answer Information for Question Classification in Community Question Answering
    Wang, Jian
    Lin, Hongfei
    Dong, Hualei
    Xiong, Daping
    Yang, Zhihao
    JOURNAL OF MULTIPLE-VALUED LOGIC AND SOFT COMPUTING, 2018, 31 (1-2) : 67 - 84
  • [5] Exploring Entities in Event Detection as Question Answering
    Boros, Emanuela
    Moreno, Jose G.
    Doucet, Antoine
    ADVANCES IN INFORMATION RETRIEVAL, PT I, 2022, 13185 : 65 - 79
  • [6] Exploring syntactic relation patterns for question answering
    Shen, D
    Kruijff, GJM
    Klakow, D
    NATURAL LANGUAGE PROCESSING - IJCNLP 2005, PROCEEDINGS, 2005, 3651 : 507 - 518
  • [7] Image captioning improved visual question answering
    Himanshu Sharma
    Anand Singh Jalal
    Multimedia Tools and Applications, 2022, 81 : 34775 - 34796
  • [8] Data Augmentation Method for Question Answering
    Ding J.
    Xiao K.
    Ye H.
    Zhou X.
    Zhang M.
    Beijing Daxue Xuebao (Ziran Kexue Ban)/Acta Scientiarum Naturalium Universitatis Pekinensis, 2022, 58 (01): : 54 - 60
  • [9] Image captioning improved visual question answering
    Sharma, Himanshu
    Jalal, Anand Singh
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (24) : 34775 - 34796
  • [10] Multimodal deep fusion for image question answering
    Zhang, Weifeng
    Yu, Jing
    Wang, Yuxia
    Wang, Wei
    KNOWLEDGE-BASED SYSTEMS, 2021, 212