ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model

被引:13
|
作者
Zhang, Mingyuan [1 ]
Guo, Xinying [1 ]
Pan, Liang [1 ]
Cai, Zhongang [1 ,2 ]
Hong, Fangzhou [1 ]
Li, Huirong [1 ]
Yang, Lei [2 ]
Liu, Ziwei [1 ]
机构
[1] Nanyang Technol Univ, S Lab, Singapore, Singapore
[2] Sensetime, Shanghai, Peoples R China
关键词
D O I
10.1109/ICCV51070.2023.00040
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
3D human motion generation is crucial for creative industry. Recent advances rely on generative models with domain knowledge for text-driven motion generation, leading to substantial progress in capturing common motions. However, the performance on more diverse motions remains unsatisfactory. In this work, we propose ReMoDiffuse, a diffusion-model-based motion generation framework that integrates a retrieval mechanism to refine the denoising process. ReMoDiffuse enhances the generalizability and diversity of text-driven motion generation with three key designs: 1) Hybrid Retrieval finds appropriate references from the database in terms of both semantic and kinematic similarities. 2) Semantic-Modulated Transformer selectively absorbs retrieval knowledge, adapting to the difference between retrieved samples and the target motion sequence. 3) Condition Mixture better utilizes the retrieval database during inference, overcoming the scale sensitivity in classifier-free guidance. Extensive experiments demonstrate that ReMoDiffuse outperforms state-of-the-art methods by balancing both text-motion consistency and motion quality, especially for more diverse motion generation. Project page: https://mingyuan-zhang.github.io/projects/ReMoDiffuse.html
引用
收藏
页码:364 / 373
页数:10
相关论文
共 50 条
  • [21] Towards Retrieval-Augmented Architectures for Image Captioning
    Sarto, Sara
    Cornia, Marcella
    Baraldi, Lorenzo
    Nicolosi, Alessandro
    Cucchiara, Rita
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (08)
  • [22] Retrieval-augmented Generation across Heterogeneous Knowledge
    Yu, Wenhao
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2022, : 52 - 58
  • [23] Hierarchical Indexing for Retrieval-Augmented Opinion Summarization
    Hosking, Tom
    Tang, Hao
    Lapata, Mirella
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2024, 12 : 1533 - 1555
  • [24] Retrieval-augmented Video Encoding for Instructional Captioning
    Jung, Yeonjoon
    Kim, Minsoo
    Choi, Seungtaek
    Seo, Minji
    Hwang, Seung-won
    Kim, Jihyuk
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 8554 - 8568
  • [25] The Journey to A Knowledgeable Assistant with Retrieval-Augmented Generation (RAG)
    Dong, Xin Luna
    COMPANION OF THE 2024 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, SIGMOD-COMPANION 2024, 2024, : 3 - 3
  • [26] LAKE-RED: Camouflaged Images Generation by Latent Background Knowledge Retrieval-Augmented Diffusion
    Zhao, Pancheng
    Xu, Peng
    Qin, Pengda
    Fan, Deng-Ping
    Zhang, Zhicheng
    Jia, Guoli
    Zhou, Bowen
    Yang, Jufeng
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 4092 - 4101
  • [27] Neural Image Popularity Assessment with Retrieval-augmented Transformer
    Ji, Liya
    Park, Chan Ho
    Rao, Zhefan
    Chen, Qifeng
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 2427 - 2436
  • [28] Query Rewriting for Retrieval-Augmented Large Language Models
    Ma, Xinbei
    Gong, Yeyun
    He, Pengcheng
    Zhao, Hai
    Duan, Nan
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 5303 - 5315
  • [29] RIGHT: Retrieval-Augmented Generation for Mainstream Hashtag Recommendation
    Fan, Run-Ze
    Fan, Yixing
    Chen, Jiangui
    Guo, Jiafeng
    Zhang, Ruqing
    Cheng, Xueqi
    ADVANCES IN INFORMATION RETRIEVAL, ECIR 2024, PT I, 2024, 14608 : 39 - 55
  • [30] Retrieval-Augmented Code Generation for Universal Information Extraction
    Guo, Yucan
    Li, Zixuan
    Jin, Xiaolong
    Liu, Yantao
    Zeng, Yutao
    Liu, Wenxuan
    Li, Xiang
    Yang, Pan
    Bai, Long
    Guo, Jiafeng
    Chen, Xueqi
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT II, NLPCC 2024, 2025, 15360 : 30 - 42