DiffDance: Cascaded Human Motion Diffusion Model for Dance Generation

被引：4

作者：

Qi, Qiaosong ^{[1
]}

Zhuo, Le ^{[2
]}

Zhang, Aixi ^{[1
]}

Liao, Yue ^{[2
]}

Fang, Fei ^{[1
]}

Liu, Si ^{[2
]}

Yan, Shuicheng ^{[3
]}

机构：

[1] Alibaba Grp, Beijing, Peoples R China

[2] Beihang Univ, Beijing, Peoples R China

[3] BAAI & Skywork AI, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023 | 2023年

基金：

中国国家自然科学基金; 国家重点研发计划;

关键词：

Diffusion Model; Music-to-Dance; Conditional Generation; Multimodal Learning;

D O I：

10.1145/3581783.3612307

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

When hearing music, it is natural for people to dance to its rhythm. Automatic dance generation, however, is a challenging task due to the physical constraints of human motion and rhythmic alignment with target music. Conventional autoregressive methods introduce compounding errors during sampling and struggle to capture the long-term structure of dance sequences. To address these limitations, we present a novel cascaded motion diffusion model, DiffDance, designed for high-resolution, long-form dance generation. This model comprises a music-to-dance diffusion model and a sequence super-resolution diffusion model. To bridge the gap between music and motion for conditional generation, DiffDance employs a pretrained audio representation learning model to extract music embeddings and further align its embedding space to motion via contrastive loss. During training our cascaded diffusion model, we also incorporate multiple geometric losses to constrain the model outputs to be physically plausible and add a dynamic loss weight that adaptively changes over diffusion timesteps to facilitate sample diversity. Through comprehensive experiments performed on the benchmark dataset AIST++, we demonstrate that DiffDance is capable of generating realistic dance sequences that align effectively with the input music. These results are comparable to those achieved by state-of-the-art autoregressive methods.

引用

页码：1374 / 1382

页数：9

共 50 条

[41] InterGen: Diffusion-Based Multi-human Motion Generation Under Complex Interactions
Liang, Han
Zhang, Wenqian
Li, Wenxuan
Yu, Jingyi
Xu, Lan
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (09) : 3463 - 3483
[42] Machine learning model-based two-dimensional matrix computation model for human motion and dance recovery
Zhang, Yi
Zhang, Mengni
COMPLEX & INTELLIGENT SYSTEMS, 2021, 7 (04) : 1805 - 1815
[43] Optimized Conversational Gesture Generation with Enhanced Motion Feature Extraction and Cascaded Generator
Wang, Xiang
Peng, Yifeng
Liu, Zhaoxiang
Dong, Shijie
Liu, Ruitao
Wang, Kai
Lian, Shiguo
NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT III, NLPCC 2024, 2025, 15361 : 369 - 381
[44] Machine learning model-based two-dimensional matrix computation model for human motion and dance recovery
Yi Zhang
Mengni Zhang
Complex & Intelligent Systems, 2021, 7 : 1805 - 1815
[45] Dance with You: The Diversity Controllable Dancer Generation via Diffusion Models
Yao, Siyue
Sun, Mingjie
Li, Bingliang
Yang, Fengyu
Wang, Junle
Zhang, Ruimao
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 8504 - 8514
[46] Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model
He, Xu
Huang, Qiaochu
Zhang, Zhensong
Lin, Zhiwei
Wu, Zhiyong
Yang, Sicheng
Li, Minglei
Chen, Zhiyi
Xu, Songcen
Wu, Xiaofei
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 2263 - 2273
[47] UniMotion-DM: Uniform Text-Motion Generation and Editing via Diffusion Model
Lin, Song
Hou, Wenjun
IEEE ACCESS, 2024, 12 : 196984 - 196999
[48] Online Motion Generation for Mirroring Human Arm Motion
Weitschat, Roman
Dietrich, Alexander
Vogel, Joern
2016 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2016, : 4245 - 4250
[49] LaVie: High-Quality Video Generation with Cascaded Latent Diffusion Models
Yaohui Wang
Xinyuan Chen
Xin Ma
Shangchen Zhou
Ziqi Huang
Yi Wang
Ceyuan Yang
Yinan He
Jiashuo Yu
Peiqing Yang
Yuwei Guo
Tianxing Wu
Chenyang Si
Yuming Jiang
Cunjian Chen
Chen Change Loy
Bo Dai
Dahua Lin
Yu Qiao
Ziwei Liu
International Journal of Computer Vision, 2025, 133 (5) : 3059 - 3078
[50] SMooDi: Stylized Motion Diffusion Model
Zhong, Lei
Xie, Yiming
Jampani, Varun
Sun, Deqing
Jiang, Huaizu
COMPUTER VISION-ECCV 2024, PT I, 2025, 15059 : 405 - 421

← 1 2 3 4 5 →