DiffDance: Cascaded Human Motion Diffusion Model for Dance Generation

被引:4
|
作者
Qi, Qiaosong [1 ]
Zhuo, Le [2 ]
Zhang, Aixi [1 ]
Liao, Yue [2 ]
Fang, Fei [1 ]
Liu, Si [2 ]
Yan, Shuicheng [3 ]
机构
[1] Alibaba Grp, Beijing, Peoples R China
[2] Beihang Univ, Beijing, Peoples R China
[3] BAAI & Skywork AI, Beijing, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Diffusion Model; Music-to-Dance; Conditional Generation; Multimodal Learning;
D O I
10.1145/3581783.3612307
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
When hearing music, it is natural for people to dance to its rhythm. Automatic dance generation, however, is a challenging task due to the physical constraints of human motion and rhythmic alignment with target music. Conventional autoregressive methods introduce compounding errors during sampling and struggle to capture the long-term structure of dance sequences. To address these limitations, we present a novel cascaded motion diffusion model, DiffDance, designed for high-resolution, long-form dance generation. This model comprises a music-to-dance diffusion model and a sequence super-resolution diffusion model. To bridge the gap between music and motion for conditional generation, DiffDance employs a pretrained audio representation learning model to extract music embeddings and further align its embedding space to motion via contrastive loss. During training our cascaded diffusion model, we also incorporate multiple geometric losses to constrain the model outputs to be physically plausible and add a dynamic loss weight that adaptively changes over diffusion timesteps to facilitate sample diversity. Through comprehensive experiments performed on the benchmark dataset AIST++, we demonstrate that DiffDance is capable of generating realistic dance sequences that align effectively with the input music. These results are comparable to those achieved by state-of-the-art autoregressive methods.
引用
收藏
页码:1374 / 1382
页数:9
相关论文
共 50 条
  • [41] InterGen: Diffusion-Based Multi-human Motion Generation Under Complex Interactions
    Liang, Han
    Zhang, Wenqian
    Li, Wenxuan
    Yu, Jingyi
    Xu, Lan
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (09) : 3463 - 3483
  • [42] Machine learning model-based two-dimensional matrix computation model for human motion and dance recovery
    Zhang, Yi
    Zhang, Mengni
    COMPLEX & INTELLIGENT SYSTEMS, 2021, 7 (04) : 1805 - 1815
  • [43] Optimized Conversational Gesture Generation with Enhanced Motion Feature Extraction and Cascaded Generator
    Wang, Xiang
    Peng, Yifeng
    Liu, Zhaoxiang
    Dong, Shijie
    Liu, Ruitao
    Wang, Kai
    Lian, Shiguo
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT III, NLPCC 2024, 2025, 15361 : 369 - 381
  • [44] Machine learning model-based two-dimensional matrix computation model for human motion and dance recovery
    Yi Zhang
    Mengni Zhang
    Complex & Intelligent Systems, 2021, 7 : 1805 - 1815
  • [45] Dance with You: The Diversity Controllable Dancer Generation via Diffusion Models
    Yao, Siyue
    Sun, Mingjie
    Li, Bingliang
    Yang, Fengyu
    Wang, Junle
    Zhang, Ruimao
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 8504 - 8514
  • [46] Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model
    He, Xu
    Huang, Qiaochu
    Zhang, Zhensong
    Lin, Zhiwei
    Wu, Zhiyong
    Yang, Sicheng
    Li, Minglei
    Chen, Zhiyi
    Xu, Songcen
    Wu, Xiaofei
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 2263 - 2273
  • [47] UniMotion-DM: Uniform Text-Motion Generation and Editing via Diffusion Model
    Lin, Song
    Hou, Wenjun
    IEEE ACCESS, 2024, 12 : 196984 - 196999
  • [48] Online Motion Generation for Mirroring Human Arm Motion
    Weitschat, Roman
    Dietrich, Alexander
    Vogel, Joern
    2016 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2016, : 4245 - 4250
  • [49] LaVie: High-Quality Video Generation with Cascaded Latent Diffusion Models
    Yaohui Wang
    Xinyuan Chen
    Xin Ma
    Shangchen Zhou
    Ziqi Huang
    Yi Wang
    Ceyuan Yang
    Yinan He
    Jiashuo Yu
    Peiqing Yang
    Yuwei Guo
    Tianxing Wu
    Chenyang Si
    Yuming Jiang
    Cunjian Chen
    Chen Change Loy
    Bo Dai
    Dahua Lin
    Yu Qiao
    Ziwei Liu
    International Journal of Computer Vision, 2025, 133 (5) : 3059 - 3078
  • [50] SMooDi: Stylized Motion Diffusion Model
    Zhong, Lei
    Xie, Yiming
    Jampani, Varun
    Sun, Deqing
    Jiang, Huaizu
    COMPUTER VISION-ECCV 2024, PT I, 2025, 15059 : 405 - 421