Object-aware navigation for remote embodied visual referring expression

被引:2
|
作者
Zhan, Zhaohuan [1 ]
Lin, Liang [2 ]
Tan, Guang [1 ]
机构
[1] Sun Yat Sen Univ, Shenzhen Campus, Shenzhen, Guangdong, Peoples R China
[2] Sun Yat sen Univ, Guangzhou, Guangdong, Peoples R China
关键词
Vision -language navigation; Referring expression; Multimodal processing;
D O I
10.1016/j.neucom.2022.10.026
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the Remote Embodied Visual Referring Expression (REVERIE) task, an agent needs to navigate through an unseen environment to identify a referred object following high-level instructions. Despite recent efforts of vision-and-language navigation (VLN), previous methods commonly rely on detailed naviga-tional instructions, which might not be available in practice. To address this issue, we present a method that strengthens vision-and-language (V&L) navigators with object-awareness. By combining object -aware textual grounding and visual grounding operations, our technique helps the navigator recognize the relationship between instructions and the contents of captured images. As a generic method, the pro-posed solution can be seamlessly integrated into other V&L navigators with different frameworks (for example, Seq2Seq or BERT). In order to alleviate the problem of data scarcity, we synthesize augmented data based on a simple yet effective prompt template that retains object information and destination information. Experimental results on REVERIE and R2R datasets demonstrate the proposed methods' applicability and performance improvement across different domains.(c) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页码:68 / 78
页数:11
相关论文
共 50 条
  • [1] Room-and-Object Aware Knowledge Reasoning for Remote Embodied Referring Expression
    Gao, Chen
    Chen, Jinyu
    Liu, Si
    Wang, Luting
    Zhang, Qiong
    Wu, Qi
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 3063 - 3072
  • [2] REVE-CE: Remote Embodied Visual Referring Expression in Continuous Environment
    Li, Xinghang
    Guo, Di
    Liu, Huaping
    Sun, Fuchun
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (02) : 1494 - 1501
  • [3] Query-based Object-aware Mapping for On-device Visual Language Mapping and Navigation
    Yun, Jun Young
    Kim, Pileun
    Journal of Institute of Control, Robotics and Systems, 2024, 30 (10) : 1068 - 1075
  • [4] Spatial Perception by Object-Aware Visual Scene Representation
    Lee, Chung-Yeon
    Lee, Hyundo
    Hwang, Injune
    Zhang, Byoung-Tak
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 3751 - 3758
  • [5] SiamCross: Siamese Cross Object-Aware Networks for Visual Object Tracking
    Huang W.-H.
    Feng Y.
    Qiang B.-H.
    Pei Y.-X.
    Luo Y.
    Jisuanji Xuebao/Chinese Journal of Computers, 2022, 45 (10): : 2151 - 2166
  • [6] Object-aware data association for the semantically constrained visual SLAM
    Yang Liu
    Chi Guo
    Yingli Wang
    Intelligent Service Robotics, 2023, 16 : 155 - 176
  • [7] Object-Aware Tracking
    Bogun, Ivan
    Ribeiro, Eraldo
    2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 1695 - 1700
  • [8] Token-word mixer meets object-aware transformer for referring image segmentation
    Zhang, Zhenliang
    Teng, Zhu
    Fan, Jack
    Zhang, Baopeng
    Fan, Jianping
    PATTERN RECOGNITION, 2024, 155
  • [9] Object-aware data association for the semantically constrained visual SLAM
    Liu, Yang
    Guo, Chi
    Wang, Yingli
    INTELLIGENT SERVICE ROBOTICS, 2023, 16 (02) : 155 - 176
  • [10] Object-aware Identification of Microservices
    Amiri, Mohammad Javad
    2018 IEEE INTERNATIONAL CONFERENCE ON SERVICES COMPUTING (IEEE SCC 2018), 2018, : 253 - 256