CSTalk: Correlation Supervised Speech-driven 3D Emotional Facial Animation Generation

被引:0
|
作者
Liang, Xiangyu [1 ]
Zhuang, Wenlin [1 ]
Wang, Tianyong [1 ]
Geng, Guangxing [2 ]
Geng, Guangyue [2 ]
Xia, Haifeng [1 ]
Xia, Siyu [1 ]
机构
[1] Southeast Univ, Sch Automat, Nanjing, Peoples R China
[2] Nanjing 8 8 Digital Technol Co Ltd, Nanjing, Peoples R China
关键词
D O I
10.1109/FG59268.2024.10581920
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech-driven 3D facial animation technology has been developed for years, but its practical application still lacks expectations. The main challenges lie in data limitations, lip alignment, and the naturalness of facial expressions. Although lip alignment has seen many related studies, existing methods struggle to synthesize natural and realistic expressions, resulting in a mechanical and stiff appearance of facial animations. Even with some research extracting emotional features from speech, the randomness of facial movements limits the effective expression of emotions. To address this issue, this paper proposes a method called CSTalk (Correlation Supervised) that models the correlations among different regions of facial movements and supervises the training of the generative model to generate realistic expressions that conform to human facial motion patterns. To generate more intricate animations, we employ a rich set of control parameters based on the metahuman character model and capture a dataset for five different emotions. We train a generative network using an autoencoder structure and input an emotion embedding vector to achieve the generation of user-control expressions. Experimental results demonstrate that our method outperforms existing state-of-the-art methods.
引用
收藏
页数:5
相关论文
共 50 条
  • [31] A comparison of acoustic coding models for speech-driven facial animation
    Kakumanu, Praveen
    Esposito, Anna
    Garcia, Oscar N.
    Gutierrez-Osuna, Ricardo
    SPEECH COMMUNICATION, 2006, 48 (06) : 598 - 615
  • [32] Towards Realistic Real Time Speech-Driven Facial Animation
    Cerekovic, Aleksandra
    Zoric, Goranka
    Smid, Karlo
    Pandzic, Igor S.
    INTELLIGENT VIRTUAL AGENTS, PROCEEDINGS, 2008, 5208 : 476 - 478
  • [33] Speech-driven facial animation with spectral gathering and temporal attention
    Yujin Chai
    Yanlin Weng
    Lvdi Wang
    Kun Zhou
    Frontiers of Computer Science, 2022, 16
  • [34] Speech-driven facial animation with spectral gathering and temporal attention
    Chai, Yujin
    Weng, Yanlin
    Wang, Lvdi
    Zhou, Kun
    FRONTIERS OF COMPUTER SCIENCE, 2022, 16 (03)
  • [35] Speech-Driven Facial Animation Using Manifold Relevance Determination
    Dawood, Samia
    Hicks, Yulia
    Marshall, David
    COMPUTER VISION - ECCV 2016 WORKSHOPS, PT II, 2016, 9914 : 869 - 882
  • [36] SPEECH-DRIVEN FACIAL ANIMATION USING POLYNOMIAL FUSION OF FEATURES
    Kefalas, Triantafyllos
    Vougioukas, Konstantinos
    Panagakis, Yannis
    Petridis, Stavros
    Kossaifi, Jean
    Pantic, Maja
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3487 - 3491
  • [37] Speech-driven facial animation with spectral gathering and temporal attention
    CHAI Yujin
    WENG Yanlin
    WANG Lvdi
    ZHOU Kun
    Frontiers of Computer Science, 2022, 16 (03)
  • [38] SYNTHESIZING REAL-TIME SPEECH-DRIVEN FACIAL ANIMATION
    Luo, Changwei
    Yu, Jun
    Wang, Zengfu
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [39] Emotional Speech-Driven Animation with Content-Emotion Disentanglement
    Danecek, Radek
    Chhatre, Kiran
    Tripathi, Shashank
    Wen, Yandong
    Black, Michael
    Bolkart, Timo
    PROCEEDINGS OF THE SIGGRAPH ASIA 2023 CONFERENCE PAPERS, 2023,
  • [40] SynFace—Speech-Driven Facial Animation for Virtual Speech-Reading Support
    Giampiero Salvi
    Jonas Beskow
    Samer Al Moubayed
    Björn Granström
    EURASIP Journal on Audio, Speech, and Music Processing, 2009