UniTalker: Scaling up Audio-Driven 3D Facial Animation Through A Unified Model

被引:0
|
作者
Fan, Xiangyu [1 ]
Li, Jiaqi [1 ]
Lin, Zhiqian [1 ]
Xiao, Weiye [1 ]
Yang, Lei [1 ]
机构
[1] SenseTime Res, Hong Kong, Peoples R China
来源
关键词
Audio-driven; Facial animation; Unified Model;
D O I
10.1007/978-3-031-72940-9_12
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Audio-driven 3D facial animation aims to map input audio to realistic facial motion. Despite significant progress, limitations arise from inconsistent 3D annotations, restricting previous models to training on specific annotations and thereby constraining the training scale. In this work, we present UniTalker, a unified model featuring a multi-head architecture designed to effectively leverage datasets with varied annotations. To enhance training stability and ensure consistency among multi-head outputs, we employ three training strategies, namely, PCA, model warm-up, and pivot identity embedding. To expand the training scale and diversity, we assemble A2F-Bench, comprising five publicly available datasets and three newly curated datasets. These datasets contain a wide range of audio domains, covering multilingual speech voices and songs, thereby scaling the training data from commonly employed datasets, typically less than 1 h, to 18.5 h. With a single trained UniTalker model, we achieve substantial lip vertex error reductions of 9.2% for BIWI dataset and 13.7% for Vocaset. Additionally, the pre-trained UniTalker exhibits promise as the foundation model for audio-driven facial animation tasks. Fine-tuning the pre-trained UniTalker on seen datasets further enhances performance on each dataset, with an average error reduction of 6.3% on A2F-Bench. Moreover, fine-tuning UniTalker on an unseen dataset with only half the data surpasses prior state-of-the-art models trained on the full dataset. The code and dataset are available at the project page (Homepage: https://github.com/X-niper/UniTalker).
引用
收藏
页码:204 / 221
页数:18
相关论文
共 50 条
  • [31] A review regarding the 3D facial animation pipeline
    de Carvalho Cruz, Artur Tavares
    Teixeira, Joao Marcelo
    PROCEEDINGS OF SYMPOSIUM ON VIRTUAL AND AUGMENTED REALITY, SVR 2021, 2021, : 192 - 196
  • [32] 3D facial animation based on texture mapping
    Din-Chang, Tseng
    Chang-Yang Lu
    Shu-Chen Wei
    Proceedings of the National Science Council, Republic of China, Part A: Physical Science and Engineering, 1996, 20 (02):
  • [33] 3D facial modeling for animation: A nonlinear approach
    Wang, Yushun
    Zhuang, Yueting
    ADVANCES IN MULTIMEDIA MODELING, PT 1, 2007, 4351 : 64 - 73
  • [34] A New Method of 3D Facial Expression Animation
    Sun, Shuo
    Ge, Chunbao
    JOURNAL OF APPLIED MATHEMATICS, 2014,
  • [35] LBF based 3D Regression for Facial Animation
    Yan, Congquan
    Wang, Liang-Hao
    Li, Jianing
    Li, Dong-Xiao
    Zhang, Ming
    2016 INTERNATIONAL CONFERENCE ON VIRTUAL REALITY AND VISUALIZATION (ICVRV 2016), 2016, : 276 - 279
  • [36] CSTalk: Correlation Supervised Speech-driven 3D Emotional Facial Animation Generation
    Liang, Xiangyu
    Zhuang, Wenlin
    Wang, Tianyong
    Geng, Guangxing
    Geng, Guangyue
    Xia, Haifeng
    Xia, Siyu
    2024 IEEE 18TH INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION, FG 2024, 2024,
  • [37] 3D facial animation driven by speech-video dual-modal signals
    Ji, Xuejie
    Liao, Zhouzhou
    Dong, Lanfang
    Tang, Yingchao
    Li, Guoming
    Mao, Meng
    COMPLEX & INTELLIGENT SYSTEMS, 2024, 10 (05) : 5951 - 5964
  • [38] NewTalker: Exploring frequency domain for speech-driven 3D facial animation with Mamba
    Niu, Weiran
    Wang, Zan
    Li, Yi
    Lou, Tangtang
    IET Image Processing, 2025, 19 (01)
  • [39] MPEG-4 compatible 3D facial animation based on morphable model
    Yin, BC
    Wang, CZ
    Shi, Q
    Sun, YF
    PROCEEDINGS OF 2005 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-9, 2005, : 4936 - 4941
  • [40] REAL-TIME CONTROL OF 3D FACIAL ANIMATION
    Luo, Changwei
    Yu, Jun
    Jiang, Chen
    Li, Rui
    Wang, Zengfu
    2014 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2014,