MLUG: Bootstrapping Language-Motion Pre-Training for Unified Motion-Language Understanding and Generation

被引：0

作者：

Luo, Hongliang ^{[1
]}

Xi, Wei ^{[1
]}

Tang, Daniel ^{[2
]}

机构：

[1] Xi An Jiao Tong Univ, Sch Comp Sci & Technol, Xian 710049, Peoples R China

[2] Mind Bridge AI Ltd, Ottawa, ON K1S 5R5, Canada

来源：

SENSORS | 2024年 / 24卷 / 22期

关键词：

motion generation; language motion; unified models;

D O I：

10.3390/s24227354

中图分类号：

O65 [分析化学];

学科分类号：

070302 ; 081704 ;

摘要：

In the realm of computer vision and animation, the generation of human motion from textual descriptions represents a frontier of significant challenge and potential. This paper introduces MLUG, a groundbreaking framework poised to transform motion synthesis by harnessing the power of vision-language pre-training techniques. MLUG addresses the nuanced challenge of creating semantically rich, physically plausible, and emotionally expressive human motions through a novel integration of a unimodal encoder with motion-text contrastive loss, a motion-grounded text encoder, a motion-grounded motion decoder, and a motion length predictor. These components work in concert to align textual descriptions with dynamic motion sequences, offering an innovative solution to the limitations of existing models in open-vocabulary motion generation and emotional expressiveness. Through extensive evaluations, MLUG demonstrates unparalleled effectiveness in generating realistic and diverse motions from a broad spectrum of textual inputs, setting a new benchmark in the field.

引用

页数：13

共 50 条

[31] VLMO: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts
Bao, Hangbo
Wang, Wenhui
Dong, Li
Liu, Qiang
Mohammed, Owais Khan
Aggarwal, Kriti
Som, Subhojit
Piao, Songhao
Wei, Furu
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[32] Pre-training for Spoken Language Understanding with Joint Textual and Phonetic Representation Learning
Chen, Qian
Wang, Wen
Zhang, Qinglin
INTERSPEECH 2021, 2021, : 1244 - 1248
[33] Survey on Vision-language Pre-training
Yin J.
Zhang Z.-D.
Gao Y.-H.
Yang Z.-W.
Li L.
Xiao M.
Sun Y.-Q.
Yan C.-G.
Ruan Jian Xue Bao/Journal of Software, 2023, 34 (05): : 2000 - 2023
[34] Speech Model Pre-training for End-to-End Spoken Language Understanding
Lugosch, Loren
Ravanelli, Mirco
Ignoto, Patrick
Tomar, Vikrant Singh
Bengio, Yoshua
INTERSPEECH 2019, 2019, : 814 - 818
[35] Pre-training Language Models for Comparative Reasoning
Yu, Mengxia
Zhang, Zhihan
Yu, Wenhao
Jiang, Meng
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 12421 - 12433
[36] Sigmoid Loss for Language Image Pre-Training
Zhai, Xiaohua
Mustafa, Basil
Kolesnikov, Alexander
Beyer, Lucas
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 11941 - 11952
[37] MarkupLM: Pre-training of Text and Markup Language for Visually Rich Document Understanding
Li, Junlong
Xu, Yiheng
Cui, Lei
Wei, Furu
PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 6078 - 6087
[38] Grounded Language-Image Pre-training
Li, Liunian Harold
Zhang, Pengchuan
Zhang, Haotian
Yang, Jianwei
Li, Chunyuan
Zhong, Yiwu
Wang, Lijuan
Yuan, Lu
Zhang, Lei
Hwang, Jenq-Neng
Chang, Kai-Wei
Gao, Jianfeng
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 10955 - 10965
[39] VILA: On Pre-training for Visual Language Models
Lin, Ji
Yin, Hongxu
Ping, Wei
Molchanov, Pavlo
Shoeybi, Mohammad
Han, Song
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 26679 - 26689
[40] RELATION ENHANCED VISION LANGUAGE PRE-TRAINING
Lee, Ju-Hee
Kang, Je-Won
2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 2286 - 2290

← 1 2 3 4 5 →