CorrTalk: Correlation Between Hierarchical Speech and Facial Activity Variances for 3D Animation

被引：0

作者：

Chu, Zhaojie ^{[1
]}

Guo, Kailing ^{[1
,2
]}

Xing, Xiaofen ^{[1
]}

Lan, Yilin ^{[3
]}

Cai, Bolun ^{[4
]}

Xu, Xiangmin ^{[2
,3
,5
]}

机构：

[1] South China Univ Technol, Sch Elect & Informat Engn, Guangzhou 510640, Peoples R China

[2] Pazhou Lab, Guangzhou 510335, Peoples R China

[3] South China Univ Technol, Sch Future Technol, Guangzhou 510640, Peoples R China

[4] ByteDance Inc, Shenzhen 518000, Peoples R China

[5] Hefei Comprehens Natl Sci Ctr, Inst Artificial Intelligence, Hefei 230088, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2024年 / 34卷 / 09期

关键词：

3D facial animation; hierarchical speech features; 3D talking head; facial activity variance; transformer; NETWORK;

D O I：

10.1109/TCSVT.2024.3386836

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Speech-driven 3D facial animation is a challenging cross-modal task that has attracted growing research interest. During speaking activities, the mouth displays strong motions, while the other facial regions typically demonstrate comparatively weak activity levels. Existing approaches often simplify the process by directly mapping single-level speech features to the entire facial animation, which overlook the differences in facial activity intensity leading to overly smoothed facial movements. In this study, we propose a novel framework, CorrTalk, which effectively establishes the temporal correlation between hierarchical speech features and facial activities of different intensities across distinct regions. A novel facial activity intensity prior is defined to distinguish between strong and weak facial activity, obtained by statistically analyzing facial animations. Based on the facial activity intensity prior, we propose a dual-branch decoding framework to synchronously synthesize strong and weak facial activity, which guarantees wider intensity facial animation synthesis. Furthermore, a weighted hierarchical feature encoder is proposed to establish temporal correlation between hierarchical speech features and facial activity at different intensities, which ensures lip-sync and plausible facial expressions. Extensive qualitatively and quantitatively experiments as well as a user study indicate that our CorrTalk outperforms existing state-of-the-art methods. The source code and supplementary video are publicly available at: https://zjchu.github.io/projects/CorrTalk/.

引用

页码：8953 / 8965

页数：13

共 50 条

[31] Speech-driven 3D Facial Animation with Implicit Emotional Awareness: A Deep Learning Approach
Pham, Hai X.
Cheung, Samuel
Pavlovic, Vladimir
2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2017, : 2328 - 2336
[32] REAL-TIME CONTROL OF 3D FACIAL ANIMATION
Luo, Changwei
Yu, Jun
Jiang, Chen
Li, Rui
Wang, Zengfu
2014 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2014,
[33] Linear Deformation on 3D Face Model for Facial Animation
Pamungkas, Joannes Agung Satriyo
Suyoto
Gunanto, Samuel Gandang
2016 1ST INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY, INFORMATION SYSTEMS AND ELECTRICAL ENGINEERING (ICITISEE), 2016, : 87 - 92
[34] Applying AI techniques for transferring 3D facial animation
Bui, The Duy
ICTACS 2006: First International Conference on Theories and Applications of Computer Science 2006, 2007, : 135 - 149
[35] Analysis of Facial Feature Design for 3D Animation Characters
Chen, Kuan-Lin
Chen, I-Ping
Hsieh, Chi-Min
VISUAL COMMUNICATION QUARTERLY, 2020, 27 (02) : 70 - 83
[36] Vision-based Animation of 3D Facial Avatars
Cho, Taehoon
Choi, Jin-Ho
Kim, Hyeon-Joong
Choi, Soo-Mi
2014 INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2014, : 128 - 132
[37] Lightweight wrinkle synthesis for 3D facial modeling and animation
Li, Jun
Xu, Weiwei
Cheng, Zhiquan
Xu, Kai
Klein, Reinhard
COMPUTER-AIDED DESIGN, 2015, 58 : 117 - 122
[38] 3D facial animation from Chinese text.
Li, N
Bu, JJ
Chen, C
Liang, RH
2003 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS, VOLS 1-5, CONFERENCE PROCEEDINGS, 2003, : 3738 - 3743
[39] Dynamic comics for hierarchical abstraction of 3D animation data
1600, Blackwell Publishing Ltd (32):
[40] ANIMATION OF GENERIC 3D HEAD MODELS DRIVEN BY SPEECH
Terissi, Lucas
Cerda, Mauricio
Gomez, Juan C.
Hitschfeld-Kahler, Nancy
Girau, Bernard
Valenzuela, Renato
2011 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2011,

← 1 2 3 4 5 →