Object-aware navigation for remote embodied visual referring expression

被引:2
|
作者
Zhan, Zhaohuan [1 ]
Lin, Liang [2 ]
Tan, Guang [1 ]
机构
[1] Sun Yat Sen Univ, Shenzhen Campus, Shenzhen, Guangdong, Peoples R China
[2] Sun Yat sen Univ, Guangzhou, Guangdong, Peoples R China
关键词
Vision -language navigation; Referring expression; Multimodal processing;
D O I
10.1016/j.neucom.2022.10.026
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the Remote Embodied Visual Referring Expression (REVERIE) task, an agent needs to navigate through an unseen environment to identify a referred object following high-level instructions. Despite recent efforts of vision-and-language navigation (VLN), previous methods commonly rely on detailed naviga-tional instructions, which might not be available in practice. To address this issue, we present a method that strengthens vision-and-language (V&L) navigators with object-awareness. By combining object -aware textual grounding and visual grounding operations, our technique helps the navigator recognize the relationship between instructions and the contents of captured images. As a generic method, the pro-posed solution can be seamlessly integrated into other V&L navigators with different frameworks (for example, Seq2Seq or BERT). In order to alleviate the problem of data scarcity, we synthesize augmented data based on a simple yet effective prompt template that retains object information and destination information. Experimental results on REVERIE and R2R datasets demonstrate the proposed methods' applicability and performance improvement across different domains.(c) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页码:68 / 78
页数:11
相关论文
共 50 条
  • [21] Symmetry-aware Neural Architecture for Embodied Visual Navigation
    Liu, Shuang
    Suganuma, Masanori
    Okatani, Takayuki
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (04) : 1091 - 1107
  • [22] Room-Object Entity Prompting and Reasoning for Embodied Referring Expression
    Gao, Chen
    Liu, Si
    Chen, Jinyu
    Wang, Luting
    Wu, Qi
    Li, Bo
    Tian, Qi
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (02) : 994 - 1010
  • [23] HROM: Learning High-Resolution Representation and Object-Aware Masks for Visual Object Tracking
    Zhang, Dawei
    Zheng, Zhonglong
    Wang, Tianxiang
    He, Yiran
    SENSORS, 2020, 20 (17) : 1 - 20
  • [24] Object-aware Dense Semantic Correspondence
    Yang, Fan
    Li, Xin
    Cheng, Hong
    Li, Jianping
    Chen, Leiting
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4151 - 4159
  • [25] OCTET: Object-aware Counterfactual Explanations
    Zemni, Mehdi
    Chen, Mickael
    Zablocki, Eloi
    Ben-Younes, Hedi
    Perez, Patrick
    Cord, Matthieu
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 15062 - 15071
  • [26] Towards Object-Aware Development Tools
    Chis, Andrei
    COMPANION PROCEEDINGS OF THE 2016 ACM SIGPLAN INTERNATIONAL CONFERENCE ON SYSTEMS, PROGRAMMING, LANGUAGES AND APPLICATIONS: SOFTWARE FOR HUMANITY (SPLASH COMPANION'16), 2016, : 65 - 66
  • [27] DYNAMIC OBJECT-AWARE MONOCULAR VISUAL ODOMETRY WITH LOCAL AND GLOBAL INFORMATION AGGREGATION
    Wan, Yiming
    Gao, Wei
    Han, Sheng
    Wu, Yihong
    2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 603 - 607
  • [28] Object-aware semantics of attention for image captioning
    Shiwei Wang
    Long Lan
    Xiang Zhang
    Guohua Dong
    Zhigang Luo
    Multimedia Tools and Applications, 2020, 79 : 2013 - 2030
  • [29] An Object-Aware Network Embedding Deep Superpixel for Semantic Segmentation of Remote Sensing Images
    Ye, Ziran
    Lin, Yue
    Dong, Baiyu
    Tan, Xiangfeng
    Dai, Mengdi
    Kong, Dedong
    REMOTE SENSING, 2024, 16 (20)
  • [30] Object-Aware Adaptive Convolution Kernel Attention Mechanism in Siamese Network for Visual Tracking
    Yuan, Dongliang
    Li, Qingdang
    Yang, Xiaohui
    Zhang, Mingyue
    Sun, Zhen
    APPLIED SCIENCES-BASEL, 2022, 12 (02):