Incorporating 3D Information into Visual Question Answering

被引:4
|
作者
Qiu, Yue [1 ,2 ]
Satoh, Yutaka [1 ,2 ]
Suzuki, Ryota [1 ]
Kataoka, Hirokatsu [1 ]
机构
[1] Natl Inst Adv Ind Sci & Technol, Tsukuba, Ibaraki, Japan
[2] Univ Tsukuba, Tsukuba, Ibaraki, Japan
关键词
D O I
10.1109/3DV.2019.00088
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a tactic of advancing Visual Question Answering (VQA) task by incorporating 3D information via multi-view images. Conventional VQA approaches, which reply an answer in words against a linguistic question about a given RGB image, have less ability to recognize geometrical information so that they tend to fail to count things or guess positional relationship. Moreover, they have no ability to determine blinded space, so it is not feasible to invent VQA function to robots which will work in highly-occluded real-world environments. To achieve the situation, we introduce a new multi-view VQA dataset along with an approach that incorporating 3D scene information directly captured from multi-view images into VQA without using depth images or employing SLAM. Our proposed approach achieves strong performance with an overall accuracy of 95.4% on the challenging multi-view VQA dataset setup, which contains relatively severe occlusion. This work also demonstrates the promising aspects of bridging the gap between 3D vision and language.
引用
收藏
页码:756 / 765
页数:10
相关论文
共 50 条
  • [41] Multimodal Attention for Visual Question Answering
    Kodra, Lorena
    Mece, Elinda Kajo
    INTELLIGENT COMPUTING, VOL 1, 2019, 858 : 783 - 792
  • [42] Affective Visual Question Answering Network
    Ruwa, Nelson
    Mao, Qirong
    Wang, Liangjun
    Dong, Ming
    IEEE 1ST CONFERENCE ON MULTIMEDIA INFORMATION PROCESSING AND RETRIEVAL (MIPR 2018), 2018, : 170 - 173
  • [43] Visual Question Answering with Question Representation Update (QRU)
    Li, Ruiyu
    Jia, Jiaya
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [44] Visual Question Answering on 360° Images
    Chou, Shih-Han
    Chao, Wei-Lun
    Lai, Wei-Sheng
    Sun, Min
    Yang, Ming-Hsuan
    2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 1596 - 1605
  • [45] Medical visual question answering: A survey
    Lin, Zhihong
    Zhang, Donghao
    Tao, Qingyi
    Shi, Danli
    Haffari, Gholamreza
    Wu, Qi
    He, Mingguang
    Ge, Zongyuan
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2023, 143
  • [46] Chain of Reasoning for Visual Question Answering
    Wu, Chenfei
    Liu, Jinlai
    Wang, Xiaojie
    Dong, Xuan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [47] Augmenting Visual Question Answering with Semantic Frame Information in a Multitask Learning Approach
    Alizadeh, Mehrdad
    Di Eugenio, Barbara
    2020 IEEE 14TH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC 2020), 2020, : 37 - 44
  • [48] Visual Question Answering as Reading Comprehension
    Li, Hui
    Wang, Peng
    Shen, Chunhua
    van den Hengel, Anton
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 6312 - 6321
  • [49] Revisiting Visual Question Answering Baselines
    Jabri, Allan
    Joulin, Armand
    van der Maaten, Laurens
    COMPUTER VISION - ECCV 2016, PT VIII, 2016, 9912 : 727 - 739
  • [50] Answer Distillation for Visual Question Answering
    Fang, Zhiwei
    Liu, Jing
    Tang, Qu
    Li, Yong
    Lu, Hanqing
    COMPUTER VISION - ACCV 2018, PT I, 2019, 11361 : 72 - 87