3D Question Answering

被引:2
|
作者
Ye, Shuquan [1 ]
Chen, Dongdong [2 ]
Han, Songfang [3 ]
Liao, Jing [1 ]
机构
[1] City Univ Hong Kong, Kowloon Tong, Hong Kong, Peoples R China
[2] Microsoft Cloud AI, Redmond, WA 98052 USA
[3] Univ Calif San Diego, La Jolla, CA 92093 USA
关键词
Point cloud; scene understanding; LANGUAGE; VISION;
D O I
10.1109/TVCG.2022.3225327
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Visual question answering (VQA) has experienced tremendous progress in recent years. However, most efforts have only focused on 2D image question-answering tasks. In this article, we extend VQA to its 3D counterpart, 3D question answering (3DQA), which can facilitate a machine's perception of 3D real-world scenarios. Unlike 2D image VQA, 3DQA takes the color point cloud as input and requires both appearance and 3D geometrical comprehension to answer the 3D-related questions. To this end, we propose a novel transformer-based 3DQA framework "3DQA-TR", which consists of two encoders to exploit the appearance and geometry information, respectively. Finally, the multi-modal information about the appearance, geometry, and linguistic question can attend to each other via a 3D-linguistic Bert to predict the target answers. To verify the effectiveness of our proposed 3DQA framework, we further develop the first 3DQA dataset "ScanQA", which builds on the ScanNet dataset and contains over 10 K question-answer pairs for 806 scenes. To the best of our knowledge, ScanQA is the first large-scale dataset with natural-language questions and free-form answers in 3D environments that is fully human-annotated. We also use several visualizations and experiments to investigate the astonishing diversity of the collected questions and the significant differences between this task from 2D VQA and 3D captioning. Extensive experiments on this dataset demonstrate the obvious superiority of our proposed 3DQA framework over state-of-the-art VQA frameworks and the effectiveness of our major designs. Our code and dataset will be made publicly available to facilitate research in this direction. The code and data are available at http://shuquanye.com/3DQA_website/.
引用
收藏
页码:1772 / 1786
页数:15
相关论文
共 50 条
  • [41] Contextualized question answering
    Bradeško L.
    Dali L.
    Fortuna B.
    Grobelnik M.
    Mladenić D.
    Novalija I.
    Pajntar B.
    Journal of Computing and Information Technology, 2010, 18 (04) : 325 - 332
  • [42] Locate Before Answering: Answer Guided Question Localization for Video Question Answering
    Qian, Tianwen
    Cui, Ran
    Chen, Jingjing
    Peng, Pai
    Guo, Xiaowei
    Jiang, Yu-Gang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 4554 - 4563
  • [43] Enhancing yes/no question answering with weak supervision via extractive question answering
    Dimitris Dimitriadis
    Grigorios Tsoumakas
    Applied Intelligence, 2023, 53 : 27560 - 27570
  • [44] Enhancing yes/no question answering with weak supervision via extractive question answering
    Dimitriadis, Dimitris
    Tsoumakas, Grigorios
    APPLIED INTELLIGENCE, 2023, 53 (22) : 27560 - 27570
  • [45] QUESTION ANSWERING SYSTEM FOR FACTOID BASED QUESTION
    Ranjan, Prakash
    Balabantaray, Rakesh Chandra
    PROCEEDINGS OF THE 2016 2ND INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING AND INFORMATICS (IC3I), 2016, : 221 - 224
  • [46] Question Analysis for Vietnamese Legal Question Answering
    Ngo Xuan Bach
    Le Thi Ngoc Cham
    Tran Ha Ngoc Thien
    Tu Minh Phuong
    2017 9TH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SYSTEMS ENGINEERING (KSE 2017), 2017, : 154 - 159
  • [47] Joint Learning of Question Answering and Question Generation
    Sun, Yibo
    Tang, Duyu
    Duan, Nan
    Qin, Tao
    Liu, Shujie
    Yan, Zhao
    Zhou, Ming
    Lv, Yuanhua
    Yin, Wenpeng
    Feng, Xiaocheng
    Qin, Bing
    Liu, Ting
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2020, 32 (05) : 971 - 982
  • [48] Question Classification in a Question Answering System on Cooking
    Manna, Riyanka
    Das, Dipankar
    Gelbukh, Alexander
    ADVANCES IN COMPUTATIONAL INTELLIGENCE, MICAI 2020, PT II, 2020, 12469 : 103 - 108
  • [49] A question-entailment approach to question answering
    Ben Abacha, Asma
    Demner-Fushman, Dina
    BMC BIOINFORMATICS, 2019, 20 (01)
  • [50] Insincere Question Classification on Question Answering Forum
    Priyambowo, Hendri
    Adriani, Mirna
    PROCEEDING OF 2019 INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATICS (ICEEI), 2019, : 390 - 394