DAP: DOMAIN-AWARE PROMPT LEARNING FOR VISION-AND-LANGUAGE NAVIGATION

被引:0
|
作者
Liu, Ting [1 ]
Hu, Yue [1 ]
Wu, Wansen [1 ]
Wang, Youkai [1 ]
Xu, Kai [1 ]
Yin, Quanjun [1 ]
机构
[1] Natl Univ Def Technol, Coll Syst Engn, Changsha, Peoples R China
关键词
vision-and-language; multimodal representation;
D O I
10.1109/ICASSP48485.2024.10446504
中图分类号
学科分类号
摘要
Following language instructions to navigate in unseen environments is a challenging task for autonomous embodied agents. With strong representation capabilities, pretrained vision-and-language models are widely used in VLN. However, most of them are trained on web-crawled general-purpose datasets, which incurs a considerable domain gap when used for VLN tasks. To address the problem, we propose a novel and model-agnostic Domain-Aware Prompt learning (DAP) framework. For equipping the pretrained models with specific object-level and scene-level cross-modal alignment in VLN tasks, DAP applies a low-cost prompt tuning paradigm to learn soft visual prompts for extracting in-domain image semantics. Specifically, we first generate a set of in-domain image-text pairs with the help of the CLIP model. Then we introduce soft visual prompts in the input space of the visual encoder in a pretrained model. DAP injects in-domain visual knowledge into the visual encoder of the pretrained model in an efficient way. Experimental results on both R2R and REVERIE show the superiority of DAP compared to existing state-of-the-art methods.
引用
收藏
页码:2615 / 2619
页数:5
相关论文
共 50 条
  • [41] Diagnosing Vision-and-Language Navigation: What Really Matters
    Zhu, Wanrong
    Qi, Yuankai
    Narayana, Pradyumna
    Sone, Kazoo
    Basu, Sugato
    Wang, Eric Xin
    Wu, Qi
    Eckstein, Miguel
    Wang, William Yang
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 5981 - 5993
  • [42] Boosting Vision-and-Language Navigation with Direction Guiding and Backtracing
    Chen, Jingwen
    Luo, Jianjie
    Pan, Yingwei
    Li, Yehao
    Yao, Ting
    Chao, Hongyang
    Mei, Tao
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (01)
  • [43] Hierarchical Spatial Proximity Reasoning for Vision-and-Language Navigation
    Xu, Ming
    Xie, Zilong
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (12): : 10756 - 10763
  • [44] Stay on the Path: Instruction Fidelity in Vision-and-Language Navigation
    Jain, Vihan
    Magalhaes, Gabriel
    Ku, Alexander
    Vaswani, Ashish
    Ie, Eugene
    Baldridge, Jason
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 1862 - 1872
  • [45] Speaker-Follower Models for Vision-and-Language Navigation
    Fried, Daniel
    Hu, Ronghang
    Cirik, Volkan
    Rohrbach, Anna
    Andreas, Jacob
    Morency, Louis-Philippe
    Berg-Kirkpatrick, Taylor
    Saenko, Kate
    Klein, Dan
    Darrell, Trevor
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [46] ESceme: Vision-and-Language Navigation with Episodic Scene Memory
    Zheng, Qi
    Liu, Daqing
    Wang, Chaoyue
    Zhang, Jing
    Wang, Dadong
    Tao, Dacheng
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025, 133 (01) : 254 - 274
  • [47] DynamicVLN: Incorporating Dynamics into Vision-and-Language Navigation Scenarios
    Sun, Yanjun
    Qiu, Yue
    Aoki, Yoshimitsu
    SENSORS, 2025, 25 (02)
  • [48] GridMM: Grid Memory Map for Vision-and-Language Navigation
    Wang, Zihan
    Li, Xiangyang
    Yang, Jiahao
    Liu, Yeqi
    Jiang, Shuqiang
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15579 - 15590
  • [49] KERM: Knowledge Enhanced Reasoning for Vision-and-Language Navigation
    Li, Xiangyang
    Wang, Zihan
    Yang, Jiahao
    Wang, Yaowei
    Jiang, Shuqiang
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2583 - 2592
  • [50] Action Inference for Destination Prediction in Vision-and-Language Navigation
    Kondapally, Anirudh Reddy
    Yamada, Kentaro
    Yanaka, Hitomi
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 4: STUDENT RESEARCH WORKSHOP, 2024, : 210 - 217