Learning Music-Dance Representations Through Explicit-Implicit Rhythm Synchronization

被引:4
|
作者
Yu, Jiashuo [1 ,2 ]
Pu, Junfu [3 ]
Cheng, Ying [4 ]
Feng, Rui [2 ]
Shan, Ying [3 ]
机构
[1] PCG Tencent, ARC Lab, Shenzhen 518000, Peoples R China
[2] Fudan Univ, Sch Comp Sci, Shanghai Collaborat Innovat Ctr Intelligent Visua, Shanghai Key Lab Intelligent Informat Proc, Shanghai 200438, Peoples R China
[3] Tencent, Appl Res Ctr, PCG, Shenzhen 518000, Peoples R China
[4] Fudan Univ, Acad Engn & Technol, Shanghai 200438, Peoples R China
基金
中国国家自然科学基金;
关键词
Rhythm; Visualization; Humanities; Synchronization; Videos; Task analysis; Feature extraction; Multimodal learning; music and dance; self-supervised learning;
D O I
10.1109/TMM.2023.3303690
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Although audio-visual representation has been proven to be applicable in many downstream tasks, the representation of dancing videos, which is more specific and always accompanied by music with complex auditory contents, remains challenging and uninvestigated. Considering the intrinsic alignment between the cadent movement of the dancer and music rhythm, we introduce MuDaR, a novel Music-Dance Representation learning framework to perform the synchronization of music and dance rhythms both in explicit and implicit ways. Specifically, we derive the dance rhythms based on visual appearance and motion cues inspired by the music rhythm analysis. Then the visual rhythms are temporally aligned with the music counterparts, which are extracted by the amplitude of sound intensity. Meanwhile, we exploit the implicit coherence of rhythms implied in audio and visual streams by contrastive learning. The model learns the joint embedding by predicting the temporal consistency between audio-visual pairs. The music-dance representation, together with the capability of detecting audio and visual rhythms, can further be applied to three downstream tasks: (a) dance classification, (b) music-dance retrieval, and (c) music-dance retargeting. Extensive experiments demonstrate that our proposed framework outperforms other self-supervised methods by a large margin.
引用
收藏
页码:8454 / 8463
页数:10
相关论文
共 26 条
  • [1] Efficient hybrid explicit-implicit learning for multiscale problems
    Efendiev, Yalchin
    Leung, Wing Tat
    Lin, Guang
    Zhang, Zecheng
    JOURNAL OF COMPUTATIONAL PHYSICS, 2022, 467
  • [2] Dance with a Robot: Encoder-Decoder Neural Network for Music-Dance Learning
    Xie, Baijun
    Park, Chung Hyuk
    HRI'20: COMPANION OF THE 2020 ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, 2020, : 526 - 528
  • [3] Hybrid explicit-implicit learning for multiscale problems with time dependent source
    Efendiev, Yalchin
    Leung, Wing Tat
    Li, Wenyuan
    Zhang, Zecheng
    COMMUNICATIONS IN NONLINEAR SCIENCE AND NUMERICAL SIMULATION, 2023, 120
  • [4] Event related desynchronization and synchronization studied during implicit and explicit learning
    Zhuang, P
    Toro, C
    Leocani, L
    Manganotti, P
    Deiber, MP
    Honda, M
    Hallett, M
    NEUROLOGY, 1996, 46 (02) : 6009 - 6009
  • [5] Explicit and implicit memory representations in cross-situational word learning
    Wang, Felix Hao
    COGNITION, 2020, 205
  • [6] BeatDance: A Beat-Based Model-Agnostic Contrastive Learning Framework for Music-Dance Retrieval
    Yang, Kaixing
    Zhou, Xukun
    Tang, Xulong
    Diao, Ran
    Liu, Hongyan
    He, Jun
    Fan, Zhaoxin
    PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 11 - 19
  • [7] Words and music: Creating transformative opportunities through implicit and explicit dialogue
    Davis, Scott M.
    PSYCHOANALYSIS SELF AND CONTEXT, 2023, 18 (04) : 618 - 629
  • [8] Understanding implicit and explicit sensorimotor learning through neural dynamics
    Deng, Xueqian
    Liufu, Mengzhan
    Xu, Jingyue
    Yang, Chen
    Li, Zina
    Chen, Juan
    FRONTIERS IN COMPUTATIONAL NEUROSCIENCE, 2022, 16
  • [9] Discussion of "Words and music: Creating transformative opportunities through implicit and explicit dialogue"
    Knoblauch, Steven H.
    PSYCHOANALYSIS SELF AND CONTEXT, 2023, 18 (04) : 630 - 635
  • [10] VLM-guided Explicit-Implicit Complementary novel class semantic learning for few-shot object detection
    Zhao, Taijin
    Qiu, Heqian
    Dai, Yu
    Wang, Lanxiao
    Mei, Hefei
    Meng, Fanman
    Wu, Qingbo
    Li, Hongliang
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 256