Dance2Music-Diffusion: leveraging latent diffusion models for music generation from dance videos

被引：0

作者：

Zhang, Chaoyang ^{[1
]}

Hua, Yan ^{[1
]}

机构：

[1] Commun Univ China, Sch Informat & Commun Engn, 1 Dingfuzhuang East St, Beijing 100024, Peoples R China

来源：

EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING | 2024年 / 2024卷 / 01期

基金：

中国国家自然科学基金;

关键词：

Diffusion; Cross-modality; Dance to music; Transformer;

D O I：

10.1186/s13636-024-00370-6

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

With the rapid development of social networks, short videos have become a popular form of content, especially dance videos. In this context, research on automatically generating music for dance videos shows significant practical value. However, existing studies face challenges such as limited richness in music timbre and lack of synchronization with dance movements. In this paper, we propose Dance2Music-Diffusion, a novel framework for music generation from dance videos using latent diffusion models. Our approach includes a motion encoder module for extracting motion features and a music diffusion generation module for generating latent music representations. By integrating dance type monitoring and latent diffusion techniques, our framework outperforms existing methods in generating complex and rich dance music. We conducted objective and subjective evaluations of the results produced by various existing models on the AIST++ dataset. Our framework shows outstanding performance in terms of beat recall rate, consistency with GT beats, and coordination with dance movements. This work represents the state of the art in automatic music generation from dance videos, is easy to train, and has implications for enhancing entertainment experiences and inspiring innovative dance productions. Sample videos of our generated music and dance can be viewed at https://youtu.be/eCvLdLdkX-Y. The code is available at https://github.com/hellto/dance2music-diffusion.

引用

页数：12

共 50 条

[1] Motion to Dance Music Generation using Latent Diffusion Model
Tan, Vanessa
Nam, JungHyun
Nam, Juhan
Noh, Junyong
PROCEEDINGS SIGGRAPH ASIA 2023 TECHNICAL COMMUNICATIONS, SA TECHNICAL COMMUNICATIONS 2023, 2023,
[2] Quantized GAN for Complex Music Generation from Dance Videos
Zhu, Ye
Olszewski, Kyle
Wu, Yu
Achlioptas, Panos
Chai, Menglei
Yan, Yan
Tulyakov, Sergey
COMPUTER VISION, ECCV 2022, PT XXXVII, 2022, 13697 : 182 - 199
[3] Diffusion of Ecstasy in the Electronic Dance Music Scene
Palamar, Joseph J.
SUBSTANCE USE & MISUSE, 2020, 55 (13) : 2243 - 2250
[4] Music2Dance: DanceNet for Music-Driven Dance Generation
Zhuang, Wenlin
Wang, Congyi
Chai, Jinxiang
Wang, Yangang
Shao, Ming
Xia, Siyu
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2022, 18 (02)
[5] Update (Dance videos and keyboard music)
Hilley, M
CLAVIER, 1997, 36 (09): : 48 - 48
[6] Discrete diffusion model with contrastive learning for music to natural and long dance generation
Huaxin Wang
Yujian Jiang
Xiangzhong Zhou
Wei Jiang
npj Heritage Science, 13 (1):
[7] THE REMIX GENERATION + DANCE MUSIC
WATNEY, S
ARTFORUM, 1994, 33 (02): : 15 - 16
[8] EDGE: Editable Dance Generation From Music
Tseng, Jonathan
Castellon, Rodrigo
Liu, C. Karen
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 448 - 458
[9] DanceU: motion-and-music-based automatic effect generation for dance videos
Pan, Yanjie
Du, Yaru
Wang, Shandong
Ye, Yun
Jiang, Yong
Zhou, Zhen
Xu, Li
Lu, Ming
Lin, Yunbiao
Lu, Jiehui
2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 2093 - 2098
[10] Music-Driven Dance Generation
Qi, Yu
Liu, Yazhou
Sun, Quansen
IEEE ACCESS, 2019, 7 : 166540 - 166550

← 1 2 3 4 5 →