Practical Cross-modal Manifold Alignment for Robotic Grounded Language Learning

被引:2
|
作者
Nguyen, Andre T. [1 ,2 ]
Richards, Luke E. [1 ,2 ]
Kebe, Gaoussou Youssouf [2 ]
Raff, Edward [1 ,2 ]
Darvish, Kasra [2 ]
Ferraro, Frank [2 ]
Matuszek, Cynthia [2 ]
机构
[1] Booz Allen Hamilton, Mclean, VA 22102 USA
[2] Univ Maryland Baltimore Cty, Baltimore, MD 21228 USA
关键词
D O I
10.1109/CVPRW53098.2021.00177
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a cross-modality manifold alignment procedure that leverages triplet loss to jointly learn consistent, multi-modal embeddings of language-based concepts of real-world items. Our approach learns these embeddings by sampling triples of anchor, positive, and negative data points from RGB-depth images and their natural language descriptions. We show that our approach can benefit from, but does not require, post-processing steps such as Pro-crustes analysis, in contrast to some of our baselines which require it for reasonable performance. We demonstrate the effectiveness of our approach on two datasets commonly used to develop robotic-based grounded language learning systems, where our approach outperforms four baselines, including a state-of-the-art approach, across five evaluation metrics.
引用
收藏
页码:1613 / 1622
页数:10
相关论文
共 50 条
  • [31] CoCM: Conditional Cross-Modal Learning for Vision-Language Models
    Yang, Juncheng
    Xie, Shuai
    Li, Shuxia
    Cai, Zengyu
    Li, Yijia
    Zhu, Weiping
    ELECTRONICS, 2025, 14 (01):
  • [32] LEARNING CROSS-MODAL REPRESENTATIONS FOR LANGUAGE-BASED IMAGE MANIPULATION
    Ak, Kenan E.
    Sun, Ying
    Lim, Joo Hwee
    2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 1601 - 1605
  • [33] Cross-modal Semantic Alignment Pre-training for Vision-and-Language Navigation
    Wu, Siying
    Fu, Xueyang
    Wu, Feng
    Zha, Zheng-Jun
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4233 - 4241
  • [34] Cross-Modal Concept Learning and Inference for Vision-Language Models
    Zhang, Yi
    Zhang, Ce
    Tang, Yushun
    He, Zhihai
    NEUROCOMPUTING, 2024, 583
  • [35] Cross-modal interactions in language production: evidence from word learning
    Pinet, Svetlana
    Martin, Clara D.
    PSYCHONOMIC BULLETIN & REVIEW, 2025, 32 (01) : 452 - 462
  • [36] Lifelong Visual-Tactile Cross-Modal Learning for Robotic Material Perception
    Zheng, Wendong
    Liu, Huaping
    Sun, Fuchun
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (03) : 1192 - 1203
  • [37] Cross-modal Transfer Learning via Multi-grained Alignment for End-to-End Spoken Language Understanding
    Zhu, Yi
    Wang, Zexun
    Liu, Hang
    Wang, Peiying
    Feng, Mingchao
    Chen, Meng
    He, Xiaodong
    INTERSPEECH 2022, 2022, : 1131 - 1135
  • [38] Cross-Modal Generation and Pair Correlation Alignment Hashing
    Ou, Weihua
    Deng, Jiaxin
    Zhang, Lei
    Gou, Jianping
    Zhou, Quan
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2023, 24 (03) : 3018 - 3026
  • [39] Cross-Modal Graph Attention Network for Entity Alignment
    Xu, Baogui
    Xu, Chengjin
    Su, Bing
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 3715 - 3723
  • [40] Reinforced Cross-modal Alignment for Radiology Report Generation
    Qin, Han
    Song, Yan
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 448 - 458