CorrTalk: Correlation Between Hierarchical Speech and Facial Activity Variances for 3D Animation

被引:0
|
作者
Chu, Zhaojie [1 ]
Guo, Kailing [1 ,2 ]
Xing, Xiaofen [1 ]
Lan, Yilin [3 ]
Cai, Bolun [4 ]
Xu, Xiangmin [2 ,3 ,5 ]
机构
[1] South China Univ Technol, Sch Elect & Informat Engn, Guangzhou 510640, Peoples R China
[2] Pazhou Lab, Guangzhou 510335, Peoples R China
[3] South China Univ Technol, Sch Future Technol, Guangzhou 510640, Peoples R China
[4] ByteDance Inc, Shenzhen 518000, Peoples R China
[5] Hefei Comprehens Natl Sci Ctr, Inst Artificial Intelligence, Hefei 230088, Peoples R China
关键词
3D facial animation; hierarchical speech features; 3D talking head; facial activity variance; transformer; NETWORK;
D O I
10.1109/TCSVT.2024.3386836
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Speech-driven 3D facial animation is a challenging cross-modal task that has attracted growing research interest. During speaking activities, the mouth displays strong motions, while the other facial regions typically demonstrate comparatively weak activity levels. Existing approaches often simplify the process by directly mapping single-level speech features to the entire facial animation, which overlook the differences in facial activity intensity leading to overly smoothed facial movements. In this study, we propose a novel framework, CorrTalk, which effectively establishes the temporal correlation between hierarchical speech features and facial activities of different intensities across distinct regions. A novel facial activity intensity prior is defined to distinguish between strong and weak facial activity, obtained by statistically analyzing facial animations. Based on the facial activity intensity prior, we propose a dual-branch decoding framework to synchronously synthesize strong and weak facial activity, which guarantees wider intensity facial animation synthesis. Furthermore, a weighted hierarchical feature encoder is proposed to establish temporal correlation between hierarchical speech features and facial activity at different intensities, which ensures lip-sync and plausible facial expressions. Extensive qualitatively and quantitatively experiments as well as a user study indicate that our CorrTalk outperforms existing state-of-the-art methods. The source code and supplementary video are publicly available at: https://zjchu.github.io/projects/CorrTalk/.
引用
收藏
页码:8953 / 8965
页数:13
相关论文
共 50 条
  • [21] 3D facial animation driven by speech-video dual-modal signals
    Ji, Xuejie
    Liao, Zhouzhou
    Dong, Lanfang
    Tang, Yingchao
    Li, Guoming
    Mao, Meng
    COMPLEX & INTELLIGENT SYSTEMS, 2024, 10 (05) : 5951 - 5964
  • [22] A review regarding the 3D facial animation pipeline
    de Carvalho Cruz, Artur Tavares
    Teixeira, Joao Marcelo
    PROCEEDINGS OF SYMPOSIUM ON VIRTUAL AND AUGMENTED REALITY, SVR 2021, 2021, : 192 - 196
  • [23] 3D facial animation based on texture mapping
    Din-Chang, Tseng
    Chang-Yang Lu
    Shu-Chen Wei
    Proceedings of the National Science Council, Republic of China, Part A: Physical Science and Engineering, 1996, 20 (02):
  • [24] 3D facial modeling for animation: A nonlinear approach
    Wang, Yushun
    Zhuang, Yueting
    ADVANCES IN MULTIMEDIA MODELING, PT 1, 2007, 4351 : 64 - 73
  • [25] LBF based 3D Regression for Facial Animation
    Yan, Congquan
    Wang, Liang-Hao
    Li, Jianing
    Li, Dong-Xiao
    Zhang, Ming
    2016 INTERNATIONAL CONFERENCE ON VIRTUAL REALITY AND VISUALIZATION (ICVRV 2016), 2016, : 276 - 279
  • [26] A New Method of 3D Facial Expression Animation
    Sun, Shuo
    Ge, Chunbao
    JOURNAL OF APPLIED MATHEMATICS, 2014,
  • [27] Speech-driven facial animation using a hierarchical model
    Cosker, DP
    Marshall, AD
    Rosin, PL
    Hicks, YA
    IEE PROCEEDINGS-VISION IMAGE AND SIGNAL PROCESSING, 2004, 151 (04): : 314 - 321
  • [28] Joint Audio-Text Model for Expressive Speech-Driven 3D Facial Animation
    Fan, Yingruo
    Lin, Zhaojiang
    Saito, Jun
    Wang, Wenping
    Komura, Taku
    PROCEEDINGS OF THE ACM ON COMPUTER GRAPHICS AND INTERACTIVE TECHNIQUES, 2022, 5 (01)
  • [29] A muscle-based 3D parametric lip model for speech-synchronized facial animation
    King, SA
    Parent, RE
    Olsafsky, BL
    DEFORMABLE AVATARS, 2001, 68 : 12 - 23
  • [30] Individual 3D face synthesis based on orthogonal photos and speech-driven facial animation
    Shan, SG
    Gao, W
    Yan, J
    Zhang, HM
    Chen, XL
    2000 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL III, PROCEEDINGS, 2000, : 238 - 241