MLUG: Bootstrapping Language-Motion Pre-Training for Unified Motion-Language Understanding and Generation

被引:0
|
作者
Luo, Hongliang [1 ]
Xi, Wei [1 ]
Tang, Daniel [2 ]
机构
[1] Xi An Jiao Tong Univ, Sch Comp Sci & Technol, Xian 710049, Peoples R China
[2] Mind Bridge AI Ltd, Ottawa, ON K1S 5R5, Canada
关键词
motion generation; language motion; unified models;
D O I
10.3390/s24227354
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
In the realm of computer vision and animation, the generation of human motion from textual descriptions represents a frontier of significant challenge and potential. This paper introduces MLUG, a groundbreaking framework poised to transform motion synthesis by harnessing the power of vision-language pre-training techniques. MLUG addresses the nuanced challenge of creating semantically rich, physically plausible, and emotionally expressive human motions through a novel integration of a unimodal encoder with motion-text contrastive loss, a motion-grounded text encoder, a motion-grounded motion decoder, and a motion length predictor. These components work in concert to align textual descriptions with dynamic motion sequences, offering an innovative solution to the limitations of existing models in open-vocabulary motion generation and emotional expressiveness. Through extensive evaluations, MLUG demonstrates unparalleled effectiveness in generating realistic and diverse motions from a broad spectrum of textual inputs, setting a new benchmark in the field.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Visual motion aftereffect from understanding motion language
    Dils, Alexia Toskos
    Boroditsky, Lera
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2010, 107 (37) : 16396 - 16400
  • [42] Unified Dialog Model Pre-training for Task-Oriented Dialog Understanding and Generation
    He, Wanwei
    Dai, Yinpei
    Yang, Min
    Sun, Jian
    Huang, Fei
    Si, Luo
    Li, Yongbin
    PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 187 - 200
  • [43] Pre-Training Language Models for Identifying Patronizing and Condescending Language: An Analysis
    Perez-Almendros, Carla
    Espinosa-Anke, Luis
    Schockaert, Steven
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 3902 - 3911
  • [44] Multi-Modal Understanding and Generation for Medical Images and Text via Vision-Language Pre-Training
    Moon, Jong Hak
    Lee, Hyungyung
    Shin, Woncheol
    Kim, Young-Hak
    Choi, Edward
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2022, 26 (12) : 6070 - 6080
  • [45] A New Pre-training Method for Training Deep Learning Models with Application to Spoken Language Understanding
    Celikyilmaz, Asli
    Sarikaya, Ruhi
    Hakkani-Tur, Dilek
    Liu, Xiaohu
    Ramesh, Nikhil
    Tur, Gokhan
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3255 - 3259
  • [46] MVP: Multi-task Supervised Pre-training for Natural Language Generation
    Tang, Tianyi
    Li, Junyi
    Zhao, Wayne Xin
    Wen, Ji-Rong
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 8758 - 8794
  • [47] Task-adaptive Pre-training and Self-training are Complementary for Natural Language Understanding
    Li, Shiyang
    Yavuz, Semih
    Chen, Wenhu
    Yan, Xifeng
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 1006 - 1015
  • [48] Understanding and Mitigating the Soft Error of Contrastive Language-Image Pre-training Models
    Shi, Yihao
    Wang, Bo
    Luo, Shengbai
    Xue, Qingshan
    Zhang, Xueyi
    Ma, Sheng
    8TH INTERNATIONAL TEST CONFERENCE IN ASIA, ITC-ASIA 2024, 2024,
  • [49] Efficient learning for spoken language understanding tasks with word embedding based pre-training
    Luan, Yi
    Watanabe, Shinji
    Harsham, Bret
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1398 - 1402
  • [50] QUERT: Continual Pre-training of Language Model for Query Understanding in Travel Domain Search
    Xie, Jian
    Liang, Yidan
    Liu, Jingping
    Xiao, Yanghua
    Wu, Baohua
    Ni, Shenghua
    PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 5282 - 5291