CorrTalk: Correlation Between Hierarchical Speech and Facial Activity Variances for 3D Animation

被引:0
|
作者
Chu, Zhaojie [1 ]
Guo, Kailing [1 ,2 ]
Xing, Xiaofen [1 ]
Lan, Yilin [3 ]
Cai, Bolun [4 ]
Xu, Xiangmin [2 ,3 ,5 ]
机构
[1] South China Univ Technol, Sch Elect & Informat Engn, Guangzhou 510640, Peoples R China
[2] Pazhou Lab, Guangzhou 510335, Peoples R China
[3] South China Univ Technol, Sch Future Technol, Guangzhou 510640, Peoples R China
[4] ByteDance Inc, Shenzhen 518000, Peoples R China
[5] Hefei Comprehens Natl Sci Ctr, Inst Artificial Intelligence, Hefei 230088, Peoples R China
关键词
3D facial animation; hierarchical speech features; 3D talking head; facial activity variance; transformer; NETWORK;
D O I
10.1109/TCSVT.2024.3386836
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Speech-driven 3D facial animation is a challenging cross-modal task that has attracted growing research interest. During speaking activities, the mouth displays strong motions, while the other facial regions typically demonstrate comparatively weak activity levels. Existing approaches often simplify the process by directly mapping single-level speech features to the entire facial animation, which overlook the differences in facial activity intensity leading to overly smoothed facial movements. In this study, we propose a novel framework, CorrTalk, which effectively establishes the temporal correlation between hierarchical speech features and facial activities of different intensities across distinct regions. A novel facial activity intensity prior is defined to distinguish between strong and weak facial activity, obtained by statistically analyzing facial animations. Based on the facial activity intensity prior, we propose a dual-branch decoding framework to synchronously synthesize strong and weak facial activity, which guarantees wider intensity facial animation synthesis. Furthermore, a weighted hierarchical feature encoder is proposed to establish temporal correlation between hierarchical speech features and facial activity at different intensities, which ensures lip-sync and plausible facial expressions. Extensive qualitatively and quantitatively experiments as well as a user study indicate that our CorrTalk outperforms existing state-of-the-art methods. The source code and supplementary video are publicly available at: https://zjchu.github.io/projects/CorrTalk/.
引用
收藏
页码:8953 / 8965
页数:13
相关论文
共 50 条
  • [41] Implementation of interactive 3D facial animation with Cult3D technology
    Pu, Qing
    XuYun, Qing
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2004, 16 (03): : 382 - 384
  • [42] Lip animation based on observed 3D speech dynamics
    Kalberer, GA
    Van Gool, L
    VIDEOMETRICS AND OPTICAL METHODS FOR 3D SHAPE MEASUREMENT, 2001, 4309 : 16 - 25
  • [43] Face animation based on observed 3D speech dynamics
    Kalberer, GA
    Van Gool, L
    COMPUTER ANIMATION 2001, PROCEEDINGS, 2001, : 20 - +
  • [44] 3D VISUAL SPEECH ANIMATION USING 2D VIDEOS
    Algadhy, Rabab
    Gotoh, Yoshihiko
    Maddock, Steve
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 2367 - 2371
  • [45] Emotional representation and animation of 3D facial models: The interface approach
    Lavagetto, F
    2001 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL II, PROCEEDINGS, 2001, : 594 - 597
  • [46] Modeling and animation of individualized faces for 3D facial expression synthesis
    Zhang, Y
    Prakash, EC
    Sung, E
    INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, 2003, 13 (01) : 42 - 64
  • [47] 3D facial modeling, animation, and rendering for digital humans: A survey
    Zhang, Yiwei
    Su, Renbin
    Yu, Jun
    Li, Rui
    NEUROCOMPUTING, 2024, 598
  • [48] 3D animation of facial plastic surgery based on computer graphics
    Zhang, Zonghua
    Zhao, Yan
    2013 INTERNATIONAL CONFERENCE ON OPTICAL INSTRUMENTS AND TECHNOLOGY: OPTOELECTRONIC IMAGING AND PROCESSING TECHNOLOGY, 2013, 9045
  • [49] Rapidly Product and Optimize Facial Animation Methods for 3D Game
    Zhao, Ming
    Zhang, Jing
    ICICSE: 2008 INTERNATIONAL CONFERENCE ON INTERNET COMPUTING IN SCIENCE AND ENGINEERING, PROCEEDINGS, 2008, : 136 - 139
  • [50] 3D Shape Regression for Real-time Facial Animation
    Cao, Chen
    Weng, Yanlin
    Lin, Stephen
    Zhou, Kun
    ACM TRANSACTIONS ON GRAPHICS, 2013, 32 (04):