LAGOON: Language-Guided Motion Control

被引:0
|
作者
Xu, Shusheng [1 ,2 ]
Wang, Huaijie [1 ,2 ]
Ouyang, Yutao [2 ,3 ]
Gao, Jiaxuan [1 ,2 ]
Meng, Zhiyu [1 ,2 ]
Yu, Chao [1 ]
Wu, Yi [1 ,2 ]
机构
[1] Tsinghua Univ, Beijing, Peoples R China
[2] Shanghai Qi Zhi Inst, Shanghai, Peoples R China
[3] Xiamen Univ, Xiamen, Peoples R China
来源
2024 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2024) | 2024年
关键词
D O I
10.1109/ICRA57147.2024.10610467
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We aim to control a robot to physically behave in the real world following any high-level language command like "cartwheel" or "kick". Although human motion datasets exist, this task remains particularly challenging since generative models can produce physically unrealistic motions, which will be more severe for robots due to different body structures and physical properties. Deploying such a motion to a physical robot can cause even greater difficulties due to the sim2real gap. We develop LAnguage-Guided mOtion cONtrol (LAGOON), a multi-phase reinforcement learning (RL) method to generate physically realistic robot motions under language commands. LAGOON first leverages a pretrained model to generate a human motion from a language command. Then an RL phase trains a control policy in simulation to mimic the generated human motion. Finally, with domain randomization, our learned policy can be deployed to a quadrupedal robot, leading to a quadrupedal robot that can take diverse behaviors in the real world under natural language commands.
引用
收藏
页码:9743 / 9750
页数:8
相关论文
共 50 条
  • [31] LucIE: Language-guided local image editing for fashion images
    Wen, Huanglu
    You, Shaodi
    Fu, Ying
    COMPUTATIONAL VISUAL MEDIA, 2025, 11 (01): : 179 - 194
  • [32] Language-Guided Transformer for Federated Multi-Label Classification
    Liu, I-Jieh
    Lin, Ci-Siang
    Yang, Fu-En
    Wang, Yu-Chiang Frank
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 12, 2024, : 13882 - 13890
  • [33] Language-Guided Progressive Attention for Visual Grounding in Remote Sensing Images
    Li, Ke
    Wang, Di
    Xu, Haojie
    Zhong, Haodi
    Wang, Cong
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 1
  • [34] LANDMARK: language-guided representation enhancement framework for scene graph generation
    Xiaoguang Chang
    Teng Wang
    Shaowei Cai
    Changyin Sun
    Applied Intelligence, 2023, 53 : 26126 - 26138
  • [35] LPN: Language-Guided Prototypical Network for Few-Shot Classification
    Cheng, Kaihui
    Yang, Chule
    Liu, Xiao
    Guan, Naiyang
    Wang, Zhiyuan
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (01) : 632 - 642
  • [36] Language-Guided Face Animation by Recurrent StyleGAN-Based Generator
    Hang, Tiankai
    Yang, Huan
    Liu, Bei
    Fu, Jianlong
    Geng, Xin
    Guo, Baining
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 9216 - 9227
  • [37] DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting
    Rao, Yongming
    Zhao, Wenliang
    Chen, Guangyi
    Tang, Yansong
    Zhu, Zheng
    Huang, Guan
    Zhou, Jie
    Lu, Jiwen
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 18061 - 18070
  • [38] A language-guided cross-modal semantic fusion retrieval method
    Zhu, Ligu
    Zhou, Fei
    Wang, Suping
    Shi, Lei
    Kou, Feifei
    Li, Zeyu
    Zhou, Pengpeng
    SIGNAL PROCESSING, 2025, 234
  • [39] CLUE: Contrastive language-guided learning for referring video object segmentation
    Gao, Qiqi
    Zhong, Wanjun
    Li, Jie
    Zhao, Tiejun
    PATTERN RECOGNITION LETTERS, 2024, 178 : 115 - 121
  • [40] Language-guided Multi-Modal Fusion for Video Action Recognition
    Hsiao, Jenhao
    Li, Yikang
    Ho, Chiuman
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 3151 - 3155