CSTalk: Correlation Supervised Speech-driven 3D Emotional Facial Animation Generation

被引:0
|
作者
Liang, Xiangyu [1 ]
Zhuang, Wenlin [1 ]
Wang, Tianyong [1 ]
Geng, Guangxing [2 ]
Geng, Guangyue [2 ]
Xia, Haifeng [1 ]
Xia, Siyu [1 ]
机构
[1] Southeast Univ, Sch Automat, Nanjing, Peoples R China
[2] Nanjing 8 8 Digital Technol Co Ltd, Nanjing, Peoples R China
关键词
D O I
10.1109/FG59268.2024.10581920
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech-driven 3D facial animation technology has been developed for years, but its practical application still lacks expectations. The main challenges lie in data limitations, lip alignment, and the naturalness of facial expressions. Although lip alignment has seen many related studies, existing methods struggle to synthesize natural and realistic expressions, resulting in a mechanical and stiff appearance of facial animations. Even with some research extracting emotional features from speech, the randomness of facial movements limits the effective expression of emotions. To address this issue, this paper proposes a method called CSTalk (Correlation Supervised) that models the correlations among different regions of facial movements and supervises the training of the generative model to generate realistic expressions that conform to human facial motion patterns. To generate more intricate animations, we employ a rich set of control parameters based on the metahuman character model and capture a dataset for five different emotions. We train a generative network using an autoencoder structure and input an emotion embedding vector to achieve the generation of user-control expressions. Experimental results demonstrate that our method outperforms existing state-of-the-art methods.
引用
收藏
页数:5
相关论文
共 50 条
  • [41] A Research on Facial Animation Driven by Emotional Speech
    Lixiang, Li
    ICCSSE 2009: PROCEEDINGS OF 2009 4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE & EDUCATION, 2009, : 118 - 121
  • [42] Speech-Driven Facial Animation by LSTM-RNN for Communication Use
    Nishimura, Ryosuke
    Sakata, Nobuchika
    Tominaga, Tomu
    Hijikata, Yoshinori
    Harada, Kensuke
    Kiyokawa, Kiyoshi
    2019 12TH ASIA PACIFIC WORKSHOP ON MIXED AND AUGMENTED REALITY (APMAR), 2019, : 22 - 29
  • [43] REALTIME SPEECH-DRIVEN FACIAL ANIMATION USING GAUSSIAN MIXTURE MODELS
    Luo, Changwei
    Yu, Jun
    Li, Xian
    Wang, Zengfu
    2014 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (ICMEW), 2014,
  • [44] Speech-Driven Facial Animation by LSTM-RNN for Communication Use
    Nishimura, Ryosuke
    Sakata, Nobuchika
    Tominaga, Tomu
    Hijikata, Yoshinori
    Harada, Kensuke
    Kiyokawa, Kiyoshi
    2019 26TH IEEE CONFERENCE ON VIRTUAL REALITY AND 3D USER INTERFACES (VR), 2019, : 1102 - 1103
  • [45] Extracting emotion from speech: Towards emotional speech-driven facial animations
    Aina, OO
    Hartmann, K
    Strothotte, T
    SMART GRAPHICS, PROCEEDINGS, 2003, 2733 : 162 - 171
  • [46] Geometry-Guided Dense Perspective Network for Speech-Driven Facial Animation
    Liu, Jingying
    Hui, Binyuan
    Li, Kun
    Liu, Yunke
    Lai, Yu-Kun
    Zhang, Yuxiang
    Liu, Yebin
    Yang, Jingyu
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2022, 28 (12) : 4873 - 4886
  • [47] A comprehensive system for facial animation of generic 3D head models driven by speech
    Terissi, Lucas D.
    Cerda, Mauricio
    Gomez, Juan C.
    Hitschfeld-Kahler, Nancy
    Girau, Bernard
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2013,
  • [48] A comprehensive system for facial animation of generic 3D head models driven by speech
    Lucas D Terissi
    Mauricio Cerda
    Juan C Gómez
    Nancy Hitschfeld-Kahler
    Bernard Girau
    EURASIP Journal on Audio, Speech, and Music Processing, 2013
  • [49] CorrTalk: Correlation Between Hierarchical Speech and Facial Activity Variances for 3D Animation
    Chu, Zhaojie
    Guo, Kailing
    Xing, Xiaofen
    Lan, Yilin
    Cai, Bolun
    Xu, Xiangmin
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (09) : 8953 - 8965
  • [50] Speech-driven face synthesis from 3D video
    Ypsilos, LA
    Hilton, A
    Turkmani, A
    Jackson, PJB
    2ND INTERNATIONAL SYMPOSIUM ON 3D DATA PROCESSING, VISUALIZATION, AND TRANSMISSION, PROCEEDINGS, 2004, : 58 - 65