CSTalk: Correlation Supervised Speech-driven 3D Emotional Facial Animation Generation

被引：0

作者：

Liang, Xiangyu ^{[1
]}

Zhuang, Wenlin ^{[1
]}

Wang, Tianyong ^{[1
]}

Geng, Guangxing ^{[2
]}

Geng, Guangyue ^{[2
]}

Xia, Haifeng ^{[1
]}

Xia, Siyu ^{[1
]}

机构：

[1] Southeast Univ, Sch Automat, Nanjing, Peoples R China

[2] Nanjing 8 8 Digital Technol Co Ltd, Nanjing, Peoples R China

来源：

2024 IEEE 18TH INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION, FG 2024 | 2024年

关键词：

D O I：

10.1109/FG59268.2024.10581920

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Speech-driven 3D facial animation technology has been developed for years, but its practical application still lacks expectations. The main challenges lie in data limitations, lip alignment, and the naturalness of facial expressions. Although lip alignment has seen many related studies, existing methods struggle to synthesize natural and realistic expressions, resulting in a mechanical and stiff appearance of facial animations. Even with some research extracting emotional features from speech, the randomness of facial movements limits the effective expression of emotions. To address this issue, this paper proposes a method called CSTalk (Correlation Supervised) that models the correlations among different regions of facial movements and supervises the training of the generative model to generate realistic expressions that conform to human facial motion patterns. To generate more intricate animations, we employ a rich set of control parameters based on the metahuman character model and capture a dataset for five different emotions. We train a generative network using an autoencoder structure and input an emotion embedding vector to achieve the generation of user-control expressions. Experimental results demonstrate that our method outperforms existing state-of-the-art methods.

引用

页数：5

共 50 条

[41] A Research on Facial Animation Driven by Emotional Speech
Lixiang, Li
ICCSSE 2009: PROCEEDINGS OF 2009 4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE & EDUCATION, 2009, : 118 - 121
[42] Speech-Driven Facial Animation by LSTM-RNN for Communication Use
Nishimura, Ryosuke
Sakata, Nobuchika
Tominaga, Tomu
Hijikata, Yoshinori
Harada, Kensuke
Kiyokawa, Kiyoshi
2019 12TH ASIA PACIFIC WORKSHOP ON MIXED AND AUGMENTED REALITY (APMAR), 2019, : 22 - 29
[43] REALTIME SPEECH-DRIVEN FACIAL ANIMATION USING GAUSSIAN MIXTURE MODELS
Luo, Changwei
Yu, Jun
Li, Xian
Wang, Zengfu
2014 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (ICMEW), 2014,
[44] Speech-Driven Facial Animation by LSTM-RNN for Communication Use
Nishimura, Ryosuke
Sakata, Nobuchika
Tominaga, Tomu
Hijikata, Yoshinori
Harada, Kensuke
Kiyokawa, Kiyoshi
2019 26TH IEEE CONFERENCE ON VIRTUAL REALITY AND 3D USER INTERFACES (VR), 2019, : 1102 - 1103
[45] Extracting emotion from speech: Towards emotional speech-driven facial animations
Aina, OO
Hartmann, K
Strothotte, T
SMART GRAPHICS, PROCEEDINGS, 2003, 2733 : 162 - 171
[46] Geometry-Guided Dense Perspective Network for Speech-Driven Facial Animation
Liu, Jingying
Hui, Binyuan
Li, Kun
Liu, Yunke
Lai, Yu-Kun
Zhang, Yuxiang
Liu, Yebin
Yang, Jingyu
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2022, 28 (12) : 4873 - 4886
[47] A comprehensive system for facial animation of generic 3D head models driven by speech
Terissi, Lucas D.
Cerda, Mauricio
Gomez, Juan C.
Hitschfeld-Kahler, Nancy
Girau, Bernard
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2013,
[48] A comprehensive system for facial animation of generic 3D head models driven by speech
Lucas D Terissi
Mauricio Cerda
Juan C Gómez
Nancy Hitschfeld-Kahler
Bernard Girau
EURASIP Journal on Audio, Speech, and Music Processing, 2013
[49] CorrTalk: Correlation Between Hierarchical Speech and Facial Activity Variances for 3D Animation
Chu, Zhaojie
Guo, Kailing
Xing, Xiaofen
Lan, Yilin
Cai, Bolun
Xu, Xiangmin
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (09) : 8953 - 8965
[50] Speech-driven face synthesis from 3D video
Ypsilos, LA
Hilton, A
Turkmani, A
Jackson, PJB
2ND INTERNATIONAL SYMPOSIUM ON 3D DATA PROCESSING, VISUALIZATION, AND TRANSMISSION, PROCEEDINGS, 2004, : 58 - 65

← 1 2 3 4 5 →