MLUG: Bootstrapping Language-Motion Pre-Training for Unified Motion-Language Understanding and Generation

被引:0
|
作者
Luo, Hongliang [1 ]
Xi, Wei [1 ]
Tang, Daniel [2 ]
机构
[1] Xi An Jiao Tong Univ, Sch Comp Sci & Technol, Xian 710049, Peoples R China
[2] Mind Bridge AI Ltd, Ottawa, ON K1S 5R5, Canada
关键词
motion generation; language motion; unified models;
D O I
10.3390/s24227354
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
In the realm of computer vision and animation, the generation of human motion from textual descriptions represents a frontier of significant challenge and potential. This paper introduces MLUG, a groundbreaking framework poised to transform motion synthesis by harnessing the power of vision-language pre-training techniques. MLUG addresses the nuanced challenge of creating semantically rich, physically plausible, and emotionally expressive human motions through a novel integration of a unimodal encoder with motion-text contrastive loss, a motion-grounded text encoder, a motion-grounded motion decoder, and a motion length predictor. These components work in concert to align textual descriptions with dynamic motion sequences, offering an innovative solution to the limitations of existing models in open-vocabulary motion generation and emotional expressiveness. Through extensive evaluations, MLUG demonstrates unparalleled effectiveness in generating realistic and diverse motions from a broad spectrum of textual inputs, setting a new benchmark in the field.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Chronologically Accurate Retrieval for Temporal Grounding of Motion-Language Models
    Fujiwara, Kent
    Tanaka, Mikihiro
    Yu, Qing
    COMPUTER VISION - ECCV 2024, PT LVIII, 2025, 15116 : 323 - 339
  • [22] UNILMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training
    Bao, Hangbo
    Dong, Li
    Wei, Furu
    Wang, Wenhui
    Yang, Nan
    Liu, Xiaodong
    Wang, Yu
    Piao, Songhao
    Gao, Jianfeng
    Zhou, Ming
    Hon, Hsiao-Wuen
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [23] Pre-training Universal Language Representation
    Li, Yian
    Zhao, Hai
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 5122 - 5133
  • [24] A Study into Pre-training Strategies for Spoken Language Understanding on Dysarthric Speech
    Wang, Pu
    BabaAli, Bagher
    Van Hamme, Hugo
    INTERSPEECH 2021, 2021, : 36 - 40
  • [25] UPPAM: A Unified Pre-training Architecture for Political Actor Modeling based on Language
    Mou, Xinyi
    Wei, Zhongyu
    Zhang, Qi
    Huang, Xuanjing
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 11996 - 12012
  • [26] Understanding Chinese Video and Language via Contrastive Multimodal Pre-Training
    Lei, Chenyi
    Luo, Shixian
    Liu, Yong
    He, Wanggui
    Wang, Jiamang
    Wang, Guoxin
    Tang, Haihong
    Miao, Chunyan
    Li, Houqiang
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 2567 - 2576
  • [27] Cross-Lingual Natural Language Generation via Pre-Training
    Chi, Zewen
    Dong, Li
    Wei, Furu
    Wang, Wenhui
    Mao, Xian-Ling
    Huang, Heyan
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 7570 - 7577
  • [28] Exploring Vision Transformers for 3D Human Motion-Language Models with Motion Patches
    Yu, Qing
    Tanaka, Mikihiro
    Fujiwara, Kent
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 937 - 946
  • [29] A Unified Language for Anthropomorphic Arm Motion
    Fang, Cheng
    Ding, Xilun
    2013 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND BIOMIMETICS (ROBIO), 2013, : 522 - 529
  • [30] Unified Medical Image Pre-training in Language-Guided Common Semantic Space
    Het, Xiaoxuan
    Yang, Yifan
    Jiang, Xinyang
    Lu, Xufang
    Hue, Haoji
    Zhao, Siyun
    Li, Dongsheng
    Yang, Yuqing
    Qiu, Lili
    COMPUTER VISION - ECCV 2024, PT LXXXI, 2025, 15139 : 123 - 139