Speech-Driven Gesture Generation Using Transformer-Based Denoising Diffusion Probabilistic Models

被引:0
|
作者
Wu, Bowen [1 ,2 ]
Liu, Chaoran [2 ,3 ,4 ]
Ishi, Carlos Toshinori [2 ,3 ]
Ishiguro, Hiroshi [3 ]
机构
[1] Osaka Univ, Grad Sch Engn Sci, Osaka 5650871, Japan
[2] RIKEN, Guardian Robot Project, Kyoto 6190288, Japan
[3] ATR, Hiroshi Ishiguro Labs, Kyoto 6190288, Japan
[4] Natl Inst Informat, Res & Dev Ctr Large Language Models, Tokyo 1018430, Japan
关键词
Diffusion models; Data models; Transformers; Feature extraction; Noise reduction; Avatars; Skeleton; Robots; Motion segmentation; Motion capture; Co-speech gesture; deep learning; gesture-based interaction; social interaction;
D O I
10.1109/THMS.2024.3456085
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
While it is crucial for human-like avatars to perform co-speech gestures, existing approaches struggle to generate natural and realistic movements. In the present study, a novel transformer-based denoising diffusion model is proposed to generate co-speech gestures. Moreover, we introduce a practical sampling trick for diffusion models to maintain the continuity between the generated motion segments while improving the within-segment motion likelihood and naturalness. Our model can be used for online generation since it generates gestures for a short segment of speech, e.g., 2 s. We evaluate our model on two large-scale speech-gesture datasets with finger movements using objective measurements and a user study, showing that our model outperforms all other baselines. Our user study is based on the Metahuman platform in the Unreal Engine, a popular tool for creating human-like avatars and motions.
引用
收藏
页码:733 / 742
页数:10
相关论文
共 50 条
  • [1] DiffMotion: Speech-Driven Gesture Synthesis Using Denoising Diffusion Model
    Zhang, Fan
    Ji, Naye
    Gao, Fuxing
    Li, Yongping
    MULTIMEDIA MODELING, MMM 2023, PT I, 2023, 13833 : 231 - 242
  • [2] On the Importance of Representations for Speech-Driven Gesture Generation
    Kucherenko, Taras
    Hasegawa, Dai
    Kaneko, Naoshi
    Henter, Gustav Eje
    Kjellstrom, Hedvig
    AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 2072 - 2074
  • [3] GestureMaster: Graph-based Speech-driven Gesture Generation
    Zhou, Chi
    Bian, Tengyue
    Chen, Kang
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2022, 2022, : 764 - 770
  • [4] Automatic Dataset Collection for Speech-Driven Gesture Generation
    Nagi, Takafumi
    Kaneko, Naoshi
    Ito, Seiya
    Sumi, Kazuhiko
    FIFTEENTH INTERNATIONAL CONFERENCE ON QUALITY CONTROL BY ARTIFICIAL VISION, 2021, 11794
  • [5] TransDDPM: Transformer-Based Denoising Diffusion Probabilistic Model for Image Restoration
    Wei, Pan
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT XI, 2024, 14435 : 250 - 263
  • [6] Analyzing Input and Output Representations for Speech-Driven Gesture Generation
    Kucherenko, Taras
    Hasegawa, Dai
    Henter, Gustav Eje
    Kaneko, Naoshi
    Kjellstrom, Hedvig
    PROCEEDINGS OF THE 19TH ACM INTERNATIONAL CONFERENCE ON INTELLIGENT VIRTUAL AGENTS (IVA' 19), 2019, : 97 - 104
  • [7] 2D medical image synthesis using transformer-based denoising diffusion probabilistic model
    Pan, Shaoyan
    Wang, Tonghe
    Qiu, Richard L. J.
    Axente, Marian
    Chang, Chih-Wei
    Peng, Junbo
    Patel, Ashish B.
    Shelton, Joseph
    Patel, Sagar A.
    Roper, Justin
    Yang, Xiaofeng
    PHYSICS IN MEDICINE AND BIOLOGY, 2023, 68 (10):
  • [8] Online processing for speech-driven gesture motion generation in android robots
    Ishi, Carlos T.
    Mikata, Ryusuke
    Minato, Takashi
    Ishiguro, Hiroshi
    2019 IEEE-RAS 19TH INTERNATIONAL CONFERENCE ON HUMANOID ROBOTS (HUMANOIDS), 2019, : 484 - 490
  • [9] SRG3: Speech-driven Robot Gesture Generation with GAN
    Yu, Chuang
    Tapus, Adriana
    16TH IEEE INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION, ROBOTICS AND VISION (ICARCV 2020), 2020, : 759 - 766
  • [10] A Speech-Driven Hand Gesture Generation Method and Evaluation in Android Robots
    Ishi, Carlos T.
    Machiyashiki, Daichi
    Mikata, Ryusuke
    Ishiguro, Hiroshi
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2018, 3 (04): : 3757 - 3764