UniTalker: Scaling up Audio-Driven 3D Facial Animation Through A Unified Model

被引:0
|
作者
Fan, Xiangyu [1 ]
Li, Jiaqi [1 ]
Lin, Zhiqian [1 ]
Xiao, Weiye [1 ]
Yang, Lei [1 ]
机构
[1] SenseTime Res, Hong Kong, Peoples R China
来源
关键词
Audio-driven; Facial animation; Unified Model;
D O I
10.1007/978-3-031-72940-9_12
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Audio-driven 3D facial animation aims to map input audio to realistic facial motion. Despite significant progress, limitations arise from inconsistent 3D annotations, restricting previous models to training on specific annotations and thereby constraining the training scale. In this work, we present UniTalker, a unified model featuring a multi-head architecture designed to effectively leverage datasets with varied annotations. To enhance training stability and ensure consistency among multi-head outputs, we employ three training strategies, namely, PCA, model warm-up, and pivot identity embedding. To expand the training scale and diversity, we assemble A2F-Bench, comprising five publicly available datasets and three newly curated datasets. These datasets contain a wide range of audio domains, covering multilingual speech voices and songs, thereby scaling the training data from commonly employed datasets, typically less than 1 h, to 18.5 h. With a single trained UniTalker model, we achieve substantial lip vertex error reductions of 9.2% for BIWI dataset and 13.7% for Vocaset. Additionally, the pre-trained UniTalker exhibits promise as the foundation model for audio-driven facial animation tasks. Fine-tuning the pre-trained UniTalker on seen datasets further enhances performance on each dataset, with an average error reduction of 6.3% on A2F-Bench. Moreover, fine-tuning UniTalker on an unseen dataset with only half the data surpasses prior state-of-the-art models trained on the full dataset. The code and dataset are available at the project page (Homepage: https://github.com/X-niper/UniTalker).
引用
收藏
页码:204 / 221
页数:18
相关论文
共 50 条
  • [41] Applying AI techniques for transferring 3D facial animation
    Bui, The Duy
    ICTACS 2006: First International Conference on Theories and Applications of Computer Science 2006, 2007, : 135 - 149
  • [42] Analysis of Facial Feature Design for 3D Animation Characters
    Chen, Kuan-Lin
    Chen, I-Ping
    Hsieh, Chi-Min
    VISUAL COMMUNICATION QUARTERLY, 2020, 27 (02) : 70 - 83
  • [43] Vision-based Animation of 3D Facial Avatars
    Cho, Taehoon
    Choi, Jin-Ho
    Kim, Hyeon-Joong
    Choi, Soo-Mi
    2014 INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2014, : 128 - 132
  • [44] Lightweight wrinkle synthesis for 3D facial modeling and animation
    Li, Jun
    Xu, Weiwei
    Cheng, Zhiquan
    Xu, Kai
    Klein, Reinhard
    COMPUTER-AIDED DESIGN, 2015, 58 : 117 - 122
  • [45] 3D facial animation from Chinese text.
    Li, N
    Bu, JJ
    Chen, C
    Liang, RH
    2003 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS, VOLS 1-5, CONFERENCE PROCEEDINGS, 2003, : 3738 - 3743
  • [46] Individual 3D face synthesis based on orthogonal photos and speech-driven facial animation
    Shan, SG
    Gao, W
    Yan, J
    Zhang, HM
    Chen, XL
    2000 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL III, PROCEEDINGS, 2000, : 238 - 241
  • [47] ANALYZING VISIBLE ARTICULATORY MOVEMENTS IN SPEECH PRODUCTION FOR SPEECH-DRIVEN 3D FACIAL ANIMATION
    Kim, Hyung Kyu
    Lee, Sangmin
    Kim, Hak Gu
    Proceedings - International Conference on Image Processing, ICIP, 2024, : 3575 - 3579
  • [48] Speech-driven 3D Facial Animation with Implicit Emotional Awareness: A Deep Learning Approach
    Pham, Hai X.
    Cheung, Samuel
    Pavlovic, Vladimir
    2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2017, : 2328 - 2336
  • [49] Real-time Individual 3D Facial Animation by Combining Parameterized Model and Muscular Model
    Yu, Jun
    Li, Lingyan
    Zou, Jie
    PROCEEDINGS OF THE 35TH CHINESE CONTROL CONFERENCE 2016, 2016, : 7088 - 7093
  • [50] A 3D facial model
    Litton, JE
    INTERNATIONAL JOURNAL OF PSYCHOLOGY, 2000, 35 (3-4) : 234 - 234