Embodied Question Answering

被引:14
|
作者
Das, Abhishek [1 ,2 ]
Datta, Samyak [1 ]
Gkioxari, Georgia [2 ]
Lee, Stefan [1 ]
Parikh, Devi [1 ,2 ]
Batra, Dhruv [1 ,2 ]
机构
[1] Georgia Inst Technol, Atlanta, GA 30332 USA
[2] Facebook AI Res, Menlo Pk, CA USA
关键词
D O I
10.1109/CVPRW.2018.00279
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a new AI task - Embodied Question Answering (EmbodiedQA) - where an agent is spawned at a random location in a 3D environment and asked a question ('What color is the car?'). In order to answer, the agent must first intelligently navigate to explore the environment, gather necessary visual information through first-person (egocentric) vision, and then answer the question ('orange'). EmbodiedQA requires a range of AI skills - language understanding, visual recognition, active perception, goal-driven navigation, commonsense reasoning, long-term memory, and grounding language into actions. In this work, we develop a dataset of questions and answers in House3D environments [1], evaluation metrics, and a hierarchical model trained with imitation and reinforcement learning.
引用
收藏
页码:2135 / 2144
页数:10
相关论文
共 50 条
  • [1] Embodied Question Answering
    Das, Abhishek
    Datta, Samyak
    Gkioxari, Georgia
    Lee, Stefan
    Parikh, Devi
    Batra, Dhruv
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 1 - 10
  • [2] Knowledge-Based Embodied Question Answering
    Tan, Sinan
    Ge, Mengmeng
    Guo, Di
    Liu, Huaping
    Sun, Fuchun
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (10) : 11948 - 11960
  • [3] Multi-Target Embodied Question Answering
    Yu, Licheng
    Chen, Xinlei
    Gkioxari, Georgia
    Bansal, Mohit
    Berg, Tamara L.
    Batra, Dhruv
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 6302 - 6311
  • [4] Embodied Question Answering in Photorealistic Environments with Point Cloud Perception
    Wijmans, Erik
    Datta, Samyak
    Maksymets, Oleksandr
    Das, Abhishek
    Gkioxari, Georgia
    Lee, Stefan
    Essa, Irfan
    Parikh, Devi
    Batra, Dhruv
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 6652 - 6661
  • [5] Embodied Referring Expression for Manipulation Question Answering in Interactive Environment
    Sima, Qie
    Tan, Sinan
    Liu, Huaping
    Sun, Fuchun
    Xu, Weifeng
    Fu, Ling
    2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2023), 2023, : 7635 - 7641
  • [6] Depth and Video Segmentation Based Visual Attention for Embodied Question Answering
    Luo, Haonan
    Lin, Guosheng
    Yao, Yazhou
    Liu, Fayao
    Liu, Zichuan
    Tang, Zhenmin
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (06) : 6807 - 6819
  • [7] SegEQA: Video Segmentation Based Visual Attention for Embodied Question Answering
    Luo, Haonan
    Lin, Guosheng
    Liu, Zichuan
    Liu, Fayao
    Tang, Zhenmin
    Yao, Yazhou
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 9666 - 9675
  • [8] Robust-EQA: Robust Learning for Embodied Question Answering With Noisy Labels
    Luo, Haonan
    Lin, Guosheng
    Shen, Fumin
    Huang, Xingguo
    Yao, Yazhou
    Shen, Hengtao
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (09) : 12083 - 12094
  • [9] Multi-Timestep-Ahead Prediction with Mixture of Experts for Embodied Question Answering
    Suzuki, Kanata
    Kamiwano, Yuya
    Chiba, Naoya
    Mori, Hiroki
    Ogata, Tetsuya
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT VI, 2023, 14259 : 243 - 255
  • [10] Turkish question answering - Question answering for distance education students
    Yurekli, Burcu
    Arslan, Ahmet
    Senel, Hakan G.
    Yilmazel, Ozgur
    ICSOFT 2008: PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON SOFTWARE AND DATA TECHNOLOGIES, VOL ISDM/ABF, 2008, : 320 - +