Enhanced Fine-Grained Motion Diffusion for Text-Driven Human Motion Synthesis

被引:0
|
作者
Wei, Dong [1 ]
Sun, Xiaoning [1 ]
Sun, Huaijiang [1 ]
Hu, Shengxiang [1 ]
Li, Bin [2 ]
Li, Weiqing [1 ]
Lu, Jianfeng [1 ]
机构
[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing, Peoples R China
[2] Tianjin AiForward Sci & Technol Co Ltd, Tianjin, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The emergence of text-driven motion synthesis technique provides animators with great potential to create efficiently. However, in most cases, textual expressions only contain general and qualitative motion descriptions, while lack fine depiction and sufficient intensity, leading to the synthesized motions that either (a) semantically compliant but uncontrollable over specific pose details, or (b) even deviates from the provided descriptions, bringing animators with undesired cases. In this paper, we propose DiffKFC, a conditional diffusion model for text-driven motion synthesis with KeyFrames Collaborated, enabling realistic generation with collaborative and efficient dual-level control: coarse guidance at semantic level, with only few keyframes for direct and fine-grained depiction down to body posture level. Unlike existing inference-editing diffusion models that incorporate conditions without training, our conditional diffusion model is explicitly trained and can fully exploit correlations among texts, keyframes and the diffused target frames. To preserve the control capability of discrete and sparse keyframes, we customize dilated mask attention modules where only partial valid tokens participate in local-to-global attention, indicated by the dilated keyframe mask. Additionally, we develop a simple yet effective smoothness prior, which steers the generated frames towards seamless keyframe transitions at inference. Extensive experiments show that our model not only achieves state-of-the-art performance in terms of semantic fidelity, but more importantly, is able to satisfy animator requirements through fine-grained guidance without tedious labor.
引用
收藏
页码:5876 / 5884
页数:9
相关论文
共 50 条
  • [1] Fg-T2M: Fine-Grained Text-Driven Human Motion Generation via Diffusion Model
    Wang, Yin
    Leng, Zhiying
    Li, Frederick W. B.
    Wu, Shun-Cheng
    Liang, Xiaohui
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 21978 - 21987
  • [2] MotionDiffuse: Text-Driven Human Motion Generation With Diffusion Model
    Zhang, Mingyuan
    Cai, Zhongang
    Pan, Liang
    Hong, Fangzhou
    Guo, Xinying
    Yang, Lei
    Liu, Ziwei
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (06) : 4115 - 4128
  • [3] GUESS: GradUally Enriching SyntheSis for Text-Driven Human Motion Generation
    Gao, Xuehao
    Yang, Yang
    Xie, Zhenyu
    Du, Shaoyi
    Sun, Zhongqian
    Wu, Yang
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2024, 30 (12) : 7518 - 7530
  • [4] LGTM: Local-to-Global Text-Driven Human Motion Diffusion Model
    Sun, Haowen
    Zheng, Ruikun
    Huang, Haibin
    Ma, Chongyang
    Huang, Hui
    Hu, Ruizhen
    PROCEEDINGS OF SIGGRAPH 2024 CONFERENCE PAPERS, 2024,
  • [5] Fg-T2M++: LLMs-Augmented Fine-Grained Text Driven Human Motion Generation
    Wang, Yin
    Li, Mu
    Liu, Jiapeng
    Leng, Zhiying
    Li, Frederick W. B.
    Zhang, Ziyao
    Liang, Xiaohui
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2025,
  • [6] 3DStyle-Diffusion: Pursuing Fine-grained Text-driven 3D Stylization with 2D Diffusion Models
    Yang, Haibo
    Chen, Yang
    Pan, Yingwei
    Yao, Ting
    Chen, Zhineng
    Mei, Tao
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 6860 - 6868
  • [7] Space-Time Diffusion Features for Zero-Shot Text-Driven Motion Transfer
    Yatim, Danah
    Fridman, Rafail
    Bar-Tal, Omer
    Kasten, Yoni
    Dekel, Tali
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2024, 2024, : 8466 - 8476
  • [8] Learning Fine-Grained Motion Embedding for Landscape Animation
    Xue, Hongwei
    Liu, Bei
    Yang, Huan
    Fu, Jianlong
    Li, Houqiang
    Luo, Jiebo
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 291 - 299
  • [9] Fine-grained uncertainty relations under relativistic motion
    Feng, Jun
    Zhang, Yao-Zhong
    Gould, Mark D.
    Fan, Heng
    EPL, 2018, 122 (06)
  • [10] Fine-Grained Motion Estimation for Video Frame Interpolation
    Yan, Bo
    Tan, Weimin
    Lin, Chuming
    Shen, Liquan
    IEEE TRANSACTIONS ON BROADCASTING, 2021, 67 (01) : 174 - 184