Scene Text Visual Question Answering

被引:145
|
作者
Biten, Ali Furkan [1 ]
Tito, Ruben [1 ]
Mafla, Andres [1 ]
Gomez, Lluis [1 ]
Rusinol, Marcal [1 ]
Valveny, Ernest [1 ]
Jawahar, C. V. [2 ]
Karatzas, Dimosthenis [1 ]
机构
[1] UAB, Comp Vis Ctr, Barcelona, Spain
[2] IIIT Hyderabad, CVIT, Hyderabad, India
关键词
D O I
10.1109/ICCV.2019.00439
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Current visual question answering datasets do not consider the rich semantic information conveyed by text within an image. In this work, we present a new dataset, ST-VQA, that aims to highlight the importance of exploiting high-level semantic information present in images as textual cues in the Visual Question Answering process. We use this dataset to define a series of tasks of increasing difficulty for which reading the scene text in the context provided by the visual information is necessary to reason and generate an appropriate answer. We propose a new evaluation metric for these tasks to account both for reasoning errors as well as shortcomings of the text recognition module. In addition we put forward a series of baseline methods, which provide further insight to the newly released dataset, and set the scene for further research.
引用
收藏
页码:4290 / 4300
页数:11
相关论文
共 50 条
  • [41] Visual explainable artificial intelligence for graph-based visual question answering and scene graph curation
    Sebastian Künzel
    Tanja Munz-Körner
    Pascal Tilli
    Noel Schäfer
    Sandeep Vidyapu
    Ngoc Thang Vu
    Daniel Weiskopf
    Visual Computing for Industry, Biomedicine, and Art, 8 (1)
  • [42] VQA: Visual Question Answering
    Antol, Stanislaw
    Agrawal, Aishwarya
    Lu, Jiasen
    Mitchell, Margaret
    Batra, Dhruv
    Zitnick, C. Lawrence
    Parikh, Devi
    2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 2425 - 2433
  • [43] Visual question answering based on local-scene-aware referring expression generation
    Kim, Jung-Jun
    Lee, Dong-Gyu
    Wu, Jialin
    Jung, Hong-Gyu
    Lee, Seong-Whan
    NEURAL NETWORKS, 2021, 139 (139) : 158 - 167
  • [44] DSGEM: Dual scene graph enhancement module-based visual question answering
    Wang, Boyue
    Ma, Yujian
    Li, Xiaoyan
    Liu, Heng
    Hu, Yongli
    Yin, Baocai
    IET COMPUTER VISION, 2023, 17 (06) : 638 - 651
  • [45] Indic Visual Question Answering
    Chandrasekar, Aditya
    Shimpi, Amey
    Naik, Dinesh
    2022 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS, SPCOM, 2022,
  • [46] VQA: Visual Question Answering
    Agrawal, Aishwarya
    Lu, Jiasen
    Antol, Stanislaw
    Mitchell, Margaret
    Zitnick, C. Lawrence
    Parikh, Devi
    Batra, Dhruv
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2017, 123 (01) : 4 - 31
  • [47] Survey on Visual Question Answering
    Bao X.-G.
    Zhou C.-L.
    Xiao K.-J.
    Qin B.
    Ruan Jian Xue Bao/Journal of Software, 2021, 32 (08): : 2522 - 2544
  • [48] Visual Question Answering A tutorial
    Teney, Damien
    Wu, Qi
    van den Hengel, Anton
    IEEE SIGNAL PROCESSING MAGAZINE, 2017, 34 (06) : 63 - 75
  • [49] Visual Question Generation as Dual Task of Visual Question Answering
    Li, Yikang
    Duan, Nan
    Zhou, Bolei
    Chu, Xiao
    Ouyang, Wanli
    Wang, Xiaogang
    Zhou, Ming
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6116 - 6124
  • [50] Text-instance graph: Exploring the relational semantics for text-based visual question answering
    Li, Xiangpeng
    Wu, Bo
    Song, Jingkuan
    Gao, Lianli
    Zeng, Pengpeng
    Gan, Chuang
    PATTERN RECOGNITION, 2022, 124