DAP: DOMAIN-AWARE PROMPT LEARNING FOR VISION-AND-LANGUAGE NAVIGATION

被引：0

作者：

Liu, Ting ^{[1
]}

Hu, Yue ^{[1
]}

Wu, Wansen ^{[1
]}

Wang, Youkai ^{[1
]}

Xu, Kai ^{[1
]}

Yin, Quanjun ^{[1
]}

机构：

[1] Natl Univ Def Technol, Coll Syst Engn, Changsha, Peoples R China

来源：

2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024 | 2024年

关键词：

vision-and-language; multimodal representation;

D O I：

10.1109/ICASSP48485.2024.10446504

中图分类号：

学科分类号：

摘要：

Following language instructions to navigate in unseen environments is a challenging task for autonomous embodied agents. With strong representation capabilities, pretrained vision-and-language models are widely used in VLN. However, most of them are trained on web-crawled general-purpose datasets, which incurs a considerable domain gap when used for VLN tasks. To address the problem, we propose a novel and model-agnostic Domain-Aware Prompt learning (DAP) framework. For equipping the pretrained models with specific object-level and scene-level cross-modal alignment in VLN tasks, DAP applies a low-cost prompt tuning paradigm to learn soft visual prompts for extracting in-domain image semantics. Specifically, we first generate a set of in-domain image-text pairs with the help of the CLIP model. Then we introduce soft visual prompts in the input space of the visual encoder in a pretrained model. DAP injects in-domain visual knowledge into the visual encoder of the pretrained model in an efficient way. Experimental results on both R2R and REVERIE show the superiority of DAP compared to existing state-of-the-art methods.

引用

页码：2615 / 2619

页数：5

共 50 条

[41] Diagnosing Vision-and-Language Navigation: What Really Matters
Zhu, Wanrong
Qi, Yuankai
Narayana, Pradyumna
Sone, Kazoo
Basu, Sugato
Wang, Eric Xin
Wu, Qi
Eckstein, Miguel
Wang, William Yang
NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 5981 - 5993
[42] Boosting Vision-and-Language Navigation with Direction Guiding and Backtracing
Chen, Jingwen
Luo, Jianjie
Pan, Yingwei
Li, Yehao
Yao, Ting
Chao, Hongyang
Mei, Tao
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (01)
[43] Hierarchical Spatial Proximity Reasoning for Vision-and-Language Navigation
Xu, Ming
Xie, Zilong
IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (12): : 10756 - 10763
[44] Stay on the Path: Instruction Fidelity in Vision-and-Language Navigation
Jain, Vihan
Magalhaes, Gabriel
Ku, Alexander
Vaswani, Ashish
Ie, Eugene
Baldridge, Jason
57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 1862 - 1872
[45] Speaker-Follower Models for Vision-and-Language Navigation
Fried, Daniel
Hu, Ronghang
Cirik, Volkan
Rohrbach, Anna
Andreas, Jacob
Morency, Louis-Philippe
Berg-Kirkpatrick, Taylor
Saenko, Kate
Klein, Dan
Darrell, Trevor
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[46] ESceme: Vision-and-Language Navigation with Episodic Scene Memory
Zheng, Qi
Liu, Daqing
Wang, Chaoyue
Zhang, Jing
Wang, Dadong
Tao, Dacheng
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025, 133 (01) : 254 - 274
[47] DynamicVLN: Incorporating Dynamics into Vision-and-Language Navigation Scenarios
Sun, Yanjun
Qiu, Yue
Aoki, Yoshimitsu
SENSORS, 2025, 25 (02)
[48] GridMM: Grid Memory Map for Vision-and-Language Navigation
Wang, Zihan
Li, Xiangyang
Yang, Jiahao
Liu, Yeqi
Jiang, Shuqiang
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15579 - 15590
[49] KERM: Knowledge Enhanced Reasoning for Vision-and-Language Navigation
Li, Xiangyang
Wang, Zihan
Yang, Jiahao
Wang, Yaowei
Jiang, Shuqiang
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2583 - 2592
[50] Action Inference for Destination Prediction in Vision-and-Language Navigation
Kondapally, Anirudh Reddy
Yamada, Kentaro
Yanaka, Hitomi
PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 4: STUDENT RESEARCH WORKSHOP, 2024, : 210 - 217

← 1 2 3 4 5 →