Incorporating 3D Information into Visual Question Answering

被引:4
|
作者
Qiu, Yue [1 ,2 ]
Satoh, Yutaka [1 ,2 ]
Suzuki, Ryota [1 ]
Kataoka, Hirokatsu [1 ]
机构
[1] Natl Inst Adv Ind Sci & Technol, Tsukuba, Ibaraki, Japan
[2] Univ Tsukuba, Tsukuba, Ibaraki, Japan
关键词
D O I
10.1109/3DV.2019.00088
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a tactic of advancing Visual Question Answering (VQA) task by incorporating 3D information via multi-view images. Conventional VQA approaches, which reply an answer in words against a linguistic question about a given RGB image, have less ability to recognize geometrical information so that they tend to fail to count things or guess positional relationship. Moreover, they have no ability to determine blinded space, so it is not feasible to invent VQA function to robots which will work in highly-occluded real-world environments. To achieve the situation, we introduce a new multi-view VQA dataset along with an approach that incorporating 3D scene information directly captured from multi-view images into VQA without using depth images or employing SLAM. Our proposed approach achieves strong performance with an overall accuracy of 95.4% on the challenging multi-view VQA dataset setup, which contains relatively severe occlusion. This work also demonstrates the promising aspects of bridging the gap between 3D vision and language.
引用
收藏
页码:756 / 765
页数:10
相关论文
共 50 条
  • [21] Visual Question Answering A tutorial
    Teney, Damien
    Wu, Qi
    van den Hengel, Anton
    IEEE SIGNAL PROCESSING MAGAZINE, 2017, 34 (06) : 63 - 75
  • [22] Visual Question Generation as Dual Task of Visual Question Answering
    Li, Yikang
    Duan, Nan
    Zhou, Bolei
    Chu, Xiao
    Ouyang, Wanli
    Wang, Xiaogang
    Zhou, Ming
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6116 - 6124
  • [23] Improving visual question answering by combining scene-text information
    Sharma, Himanshu
    Jalal, Anand Singh
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (09) : 12177 - 12208
  • [24] VIG: Visual Information-Guided Knowledge-Based Visual Question Answering
    Liu, Heng
    Wang, Boyue
    Sun, Yanfeng
    Li, Xiaoyan
    Hu, Yongli
    Yin, Baocai
    PROCEEDINGS OF THE 2024 27 TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, CSCWD 2024, 2024, : 1086 - 1091
  • [25] CAPTURING GLOBAL AND LOCAL INFORMATION IN REMOTE SENSING VISUAL QUESTION ANSWERING
    Guo, Yan
    Huang, Yuancheng
    2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 6340 - 6343
  • [26] Improving visual question answering by combining scene-text information
    Himanshu Sharma
    Anand Singh Jalal
    Multimedia Tools and Applications, 2022, 81 : 12177 - 12208
  • [27] Sequential Visual Reasoning for Visual Question Answering
    Liu, Jinlai
    Wu, Chenfei
    Wang, Xiaojie
    Dong, Xuan
    PROCEEDINGS OF 2018 5TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (CCIS), 2018, : 410 - 415
  • [28] Question Answering with BERT: designing a 3D virtual avatar for Cultural Heritage exploration
    Farella, Mariella
    Chiazzese, Giuseppe
    Lo Bosco, Giosue
    2022 IEEE 21ST MEDITERRANEAN ELECTROTECHNICAL CONFERENCE (IEEE MELECON 2022), 2022, : 770 - 774
  • [29] An Improved Attention for Visual Question Answering
    Rahman, Tanzila
    Chou, Shih-Han
    Sigal, Leonid
    Carenini, Giuseppe
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 1653 - 1662
  • [30] Robust Explanations for Visual Question Answering
    Patro, Badri N.
    Patel, Shivansh
    Namboodiri, Vinay P.
    2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 1566 - 1575