CorrTalk: Correlation Between Hierarchical Speech and Facial Activity Variances for 3D Animation

被引:0
|
作者
Chu, Zhaojie [1 ]
Guo, Kailing [1 ,2 ]
Xing, Xiaofen [1 ]
Lan, Yilin [3 ]
Cai, Bolun [4 ]
Xu, Xiangmin [2 ,3 ,5 ]
机构
[1] South China Univ Technol, Sch Elect & Informat Engn, Guangzhou 510640, Peoples R China
[2] Pazhou Lab, Guangzhou 510335, Peoples R China
[3] South China Univ Technol, Sch Future Technol, Guangzhou 510640, Peoples R China
[4] ByteDance Inc, Shenzhen 518000, Peoples R China
[5] Hefei Comprehens Natl Sci Ctr, Inst Artificial Intelligence, Hefei 230088, Peoples R China
关键词
3D facial animation; hierarchical speech features; 3D talking head; facial activity variance; transformer; NETWORK;
D O I
10.1109/TCSVT.2024.3386836
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Speech-driven 3D facial animation is a challenging cross-modal task that has attracted growing research interest. During speaking activities, the mouth displays strong motions, while the other facial regions typically demonstrate comparatively weak activity levels. Existing approaches often simplify the process by directly mapping single-level speech features to the entire facial animation, which overlook the differences in facial activity intensity leading to overly smoothed facial movements. In this study, we propose a novel framework, CorrTalk, which effectively establishes the temporal correlation between hierarchical speech features and facial activities of different intensities across distinct regions. A novel facial activity intensity prior is defined to distinguish between strong and weak facial activity, obtained by statistically analyzing facial animations. Based on the facial activity intensity prior, we propose a dual-branch decoding framework to synchronously synthesize strong and weak facial activity, which guarantees wider intensity facial animation synthesis. Furthermore, a weighted hierarchical feature encoder is proposed to establish temporal correlation between hierarchical speech features and facial activity at different intensities, which ensures lip-sync and plausible facial expressions. Extensive qualitatively and quantitatively experiments as well as a user study indicate that our CorrTalk outperforms existing state-of-the-art methods. The source code and supplementary video are publicly available at: https://zjchu.github.io/projects/CorrTalk/.
引用
收藏
页码:8953 / 8965
页数:13
相关论文
共 50 条
  • [31] Speech-driven 3D Facial Animation with Implicit Emotional Awareness: A Deep Learning Approach
    Pham, Hai X.
    Cheung, Samuel
    Pavlovic, Vladimir
    2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2017, : 2328 - 2336
  • [32] REAL-TIME CONTROL OF 3D FACIAL ANIMATION
    Luo, Changwei
    Yu, Jun
    Jiang, Chen
    Li, Rui
    Wang, Zengfu
    2014 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2014,
  • [33] Linear Deformation on 3D Face Model for Facial Animation
    Pamungkas, Joannes Agung Satriyo
    Suyoto
    Gunanto, Samuel Gandang
    2016 1ST INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY, INFORMATION SYSTEMS AND ELECTRICAL ENGINEERING (ICITISEE), 2016, : 87 - 92
  • [34] Applying AI techniques for transferring 3D facial animation
    Bui, The Duy
    ICTACS 2006: First International Conference on Theories and Applications of Computer Science 2006, 2007, : 135 - 149
  • [35] Analysis of Facial Feature Design for 3D Animation Characters
    Chen, Kuan-Lin
    Chen, I-Ping
    Hsieh, Chi-Min
    VISUAL COMMUNICATION QUARTERLY, 2020, 27 (02) : 70 - 83
  • [36] Vision-based Animation of 3D Facial Avatars
    Cho, Taehoon
    Choi, Jin-Ho
    Kim, Hyeon-Joong
    Choi, Soo-Mi
    2014 INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2014, : 128 - 132
  • [37] Lightweight wrinkle synthesis for 3D facial modeling and animation
    Li, Jun
    Xu, Weiwei
    Cheng, Zhiquan
    Xu, Kai
    Klein, Reinhard
    COMPUTER-AIDED DESIGN, 2015, 58 : 117 - 122
  • [38] 3D facial animation from Chinese text.
    Li, N
    Bu, JJ
    Chen, C
    Liang, RH
    2003 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS, VOLS 1-5, CONFERENCE PROCEEDINGS, 2003, : 3738 - 3743
  • [39] Dynamic comics for hierarchical abstraction of 3D animation data
    1600, Blackwell Publishing Ltd (32):
  • [40] ANIMATION OF GENERIC 3D HEAD MODELS DRIVEN BY SPEECH
    Terissi, Lucas
    Cerda, Mauricio
    Gomez, Juan C.
    Hitschfeld-Kahler, Nancy
    Girau, Bernard
    Valenzuela, Renato
    2011 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2011,