Object-aware navigation for remote embodied visual referring expression

被引:2
|
作者
Zhan, Zhaohuan [1 ]
Lin, Liang [2 ]
Tan, Guang [1 ]
机构
[1] Sun Yat Sen Univ, Shenzhen Campus, Shenzhen, Guangdong, Peoples R China
[2] Sun Yat sen Univ, Guangzhou, Guangdong, Peoples R China
关键词
Vision -language navigation; Referring expression; Multimodal processing;
D O I
10.1016/j.neucom.2022.10.026
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the Remote Embodied Visual Referring Expression (REVERIE) task, an agent needs to navigate through an unseen environment to identify a referred object following high-level instructions. Despite recent efforts of vision-and-language navigation (VLN), previous methods commonly rely on detailed naviga-tional instructions, which might not be available in practice. To address this issue, we present a method that strengthens vision-and-language (V&L) navigators with object-awareness. By combining object -aware textual grounding and visual grounding operations, our technique helps the navigator recognize the relationship between instructions and the contents of captured images. As a generic method, the pro-posed solution can be seamlessly integrated into other V&L navigators with different frameworks (for example, Seq2Seq or BERT). In order to alleviate the problem of data scarcity, we synthesize augmented data based on a simple yet effective prompt template that retains object information and destination information. Experimental results on REVERIE and R2R datasets demonstrate the proposed methods' applicability and performance improvement across different domains.(c) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页码:68 / 78
页数:11
相关论文
共 50 条
  • [31] Object-Aware Guidance for Autonomous Scene Reconstruction
    Liu, Ligang
    Xia, Xi
    Sun, Han
    Shen, Qi
    Xu, Juzhan
    Chen, Bin
    Huang, Hui
    Xu, Kai
    ACM TRANSACTIONS ON GRAPHICS, 2018, 37 (04):
  • [32] Object-aware semantics of attention for image captioning
    Wang, Shiwei
    Lan, Long
    Zhang, Xiang
    Dong, Guohua
    Luo, Zhigang
    MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (3-4) : 2013 - 2030
  • [33] Object-Aware Image Augmentation for Audio-Visual Zero-Shot Learning
    Dong, Yujie
    Chen, Shiming
    Duan, Bowen
    Ding, Weiping
    Wang, Yisong
    You, Xinge
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024,
  • [34] Dynamic Object-Aware Visual Odometry (VO) Estimation Based on Optical Flow Matching
    Cho, Hae Min
    Kim, Euntai
    IEEE ACCESS, 2023, 11 : 11642 - 11651
  • [35] SFNet: Learning Object-aware Semantic Correspondence
    Lee, Junghyup
    Kim, Dohyung
    Ponce, Jean
    Ham, Bumsub
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 2273 - 2282
  • [36] Object-Aware Dictionary Learning with Deep Features
    Xie, Yurui
    Porikli, Fatih
    He, Xuming
    COMPUTER VISION - ACCV 2016, PT II, 2017, 10112 : 237 - 253
  • [37] Object-aware Image Compression with Adversarial Learning
    Du, Yunfei
    Zhao, Nan
    Duan, Yiping
    Han, Chaoyi
    2019 IEEE/CIC INTERNATIONAL CONFERENCE ON COMMUNICATIONS IN CHINA (ICCC), 2019,
  • [38] Object-Aware Adaptive-Positivity Learning for Audio-Visual Question Answering
    Li, Zhangbin
    Guo, Dan
    Zhou, Jinxing
    Zhang, Jing
    Wang, Meng
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 4, 2024, : 3306 - 3314
  • [39] Object-Aware NIR-to-Visible Translation
    Gao, Yunyi
    Gu, Lin
    Liu, Qiankun
    Fu, Ying
    COMPUTER VISION - ECCV 2024, PT XXIII, 2025, 15081 : 93 - 109
  • [40] Object-Aware Instance Labeling forWeakly Supervised Object Detection
    Kosugi, Satoshi
    Yamasaki, Toshihiko
    Aizawa, Kiyoharu
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6063 - 6071