Attention Based Natural Language Grounding by Navigating Virtual Environment

被引:7
|
作者
Sinha, Abhishek [1 ]
Akilesh, B. [1 ,2 ]
Sarkar, Mausoom [1 ]
Krishnamurthy, Balaji [1 ]
机构
[1] Adobe Syst, Noida, India
[2] Univ Montreal, Mila, Montreal, PQ, Canada
关键词
D O I
10.1109/WACV.2019.00031
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this work, we focus on the problem of grounding language by training an agent to follow a set of natural language instructions and navigate to a target object in an environment. The agent receives visual information through raw pixels and a natural language instruction telling what task needs to be achieved and is trained in an end-to-end way. We develop an attention mechanism for multi-modal fusion of visual and textual modalities that allows the agent to learn to complete the task and achieve language grounding. Our experimental results show that our attention mechanism outperforms the existing multi-modal fusion mechanisms proposed for both 2D and 3D environments in order to solve the above-mentioned task in terms of both speed and success rate. We show that the learnt textual representations are semantically meaningful as they follow vector arithmetic in the embedding space. The effectiveness of our attention approach over the contemporary fusion mechanisms is also highlighted from the textual embeddings learnt by the different approaches. We also show that our model generalizes effectively to unseen scenarios and exhibit zero-shot generalization capabilities both in 2D and 3D environments. The code for our 2D environment as well as the models that we developed for both 2D and 3D are available at https://github.com/rl-lang-grounding/rl-lang-ground.
引用
收藏
页码:236 / 244
页数:9
相关论文
共 50 条
  • [1] Navigating in natural environments: A virtual environment training transfer study
    Darken, RP
    Banker, WP
    IEEE 1998 VIRTUAL REALITY ANNUAL INTERNATIONAL SYMPOSIUM, PROCEEDINGS, 1998, : 12 - 19
  • [2] Navigating mazes in a virtual environment
    Browse, RA
    Skillicorn, DB
    Middleman, D
    HUMAN VISION AND ELECTRONIC IMAGING VIII, 2003, 5007 : 392 - 399
  • [3] The natural language as a virtual environment for the human - machine interface
    Merlyan, L.L.
    Problemy Upravleniya I Informatiki (Avtomatika), 2001, (02): : 88 - 97
  • [4] MAGNet: Multi-Region Attention-Assisted Grounding of Natural Language Queries at Phrase Level
    Shrestha, Amar
    Pugdeethosapol, Krittaphat
    Fang, Haowen
    Qiu, Qinru
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 8275 - 8282
  • [5] Natural language learning and grounding for robotic systems
    Alomari, Muhannad
    Hogg, David
    Cohn, Anthony
    COGNITIVE PROCESSING, 2018, 19 : S14 - S15
  • [6] Grounding Ontologies with Social Processes and Natural Language
    Debruyne, Christophe
    Tran, Trung-Kien
    Meersman, Robert
    JOURNAL ON DATA SEMANTICS, 2013, 2 (2-3) : 89 - 118
  • [7] Attention-based Natural Language Person Retrieval
    Zhou, Tao
    Chen, Muhao
    Yu, Jie
    Terzopoulos, Demetri
    2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2017, : 27 - 34
  • [8] Navigating abstract virtual environment: an eeg study
    Alireza Mahdizadeh Hakak
    Joydeep Bhattacharya
    Nimish Biloria
    Roy de Kleijn
    Fanak Shah-Mohammadi
    Cognitive Neurodynamics, 2016, 10 : 471 - 480
  • [9] Navigating abstract virtual environment: an eeg study
    Hakak, Alireza Mahdizadeh
    Bhattacharya, Joydeep
    Biloria, Nimish
    de Kleijn, Roy
    Shah-Mohammadi, Fanak
    COGNITIVE NEURODYNAMICS, 2016, 10 (06) : 471 - 480
  • [10] Individual Differences in Navigating a Threatening Virtual Environment
    Minear, Meredith
    Sensibaugh, Tesalee
    COGNITIVE PROCESSING, 2021, 22 (SUPPL 1) : 34 - 34