Real-time speech-driven face animation with expressions using neural networks

被引:70
|
作者
Hong, PY [1 ]
Wen, Z [1 ]
Huang, TS [1 ]
机构
[1] Univ Illinois, Beckman Inst Adv Sci & Technol, Urbana, IL 61801 USA
来源
IEEE TRANSACTIONS ON NEURAL NETWORKS | 2002年 / 13卷 / 04期
基金
美国国家科学基金会;
关键词
facial deformation modeling; facial motion analysis and synthesis; neural networks; real-time speech-driven; talking face with expressions;
D O I
10.1109/TNN.2002.1021892
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A real-time speech-driven synthetic talking face provides an effective multimodal communication interface in distributed collaboration environments. Nonverbal gestures such as facial expressions are important to human communication and should be considered by speech-driven face animation systems. In this paper, we present a framework that systematically addresses facial deformation modeling, automatic facial motion analysis, and real-time speech-driven face animation with expression using neural networks. Based on this framework, we learn a quantitative visual representation of the facial deformations, called the motion units (MUs). An facial deformation can be approximated by a linear combination of the MUs weighted by MU parameters (MVPs). We develop an MU-based facial motion tracking algorithm which is used to collect an audio-visual training database. Then, we construct a real-time audio-to-MUP mapping by training a set of neural networks using the collected audio-visual training database. The quantitative evaluation of the mapping shows the effectiveness of the proposed approach. Using the proposed method, we develop the functionality of real-time speech-driven face animation with expressions for the iFACE system. Experimental results show that the synthetic expressive talking face of the iFACE system is comparable with a real face in terms of the effectiveness of their influences on bimodal human emotion perception.
引用
收藏
页码:916 / 927
页数:12
相关论文
共 50 条
  • [31] A comparison of acoustic coding models for speech-driven facial animation
    Kakumanu, Praveen
    Esposito, Anna
    Garcia, Oscar N.
    Gutierrez-Osuna, Ricardo
    SPEECH COMMUNICATION, 2006, 48 (06) : 598 - 615
  • [32] Towards Real-time Speech Emotion Recognition using Deep Neural Networks
    Fayek, H. M.
    Lech, M.
    Cavedon, L.
    2015 9TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATION SYSTEMS (ICSPCS), 2015,
  • [33] Speech-driven facial animation with spectral gathering and temporal attention
    Yujin Chai
    Yanlin Weng
    Lvdi Wang
    Kun Zhou
    Frontiers of Computer Science, 2022, 16
  • [34] Speech-driven facial animation with spectral gathering and temporal attention
    Chai, Yujin
    Weng, Yanlin
    Wang, Lvdi
    Zhou, Kun
    FRONTIERS OF COMPUTER SCIENCE, 2022, 16 (03)
  • [35] A study on auditory feature spaces for speech-driven lip animation
    Le-Jan, Guylaine
    Benezeth, Yannick
    Gravier, Guillaume
    Bimbot, Frederic
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2508 - 2511
  • [36] Speech-driven facial animation with spectral gathering and temporal attention
    CHAI Yujin
    WENG Yanlin
    WANG Lvdi
    ZHOU Kun
    Frontiers of Computer Science, 2022, 16 (03)
  • [37] Emotional Speech-Driven Animation with Content-Emotion Disentanglement
    Danecek, Radek
    Chhatre, Kiran
    Tripathi, Shashank
    Wen, Yandong
    Black, Michael
    Bolkart, Timo
    PROCEEDINGS OF THE SIGGRAPH ASIA 2023 CONFERENCE PAPERS, 2023,
  • [38] SynFace—Speech-Driven Facial Animation for Virtual Speech-Reading Support
    Giampiero Salvi
    Jonas Beskow
    Samer Al Moubayed
    Björn Granström
    EURASIP Journal on Audio, Speech, and Music Processing, 2009
  • [39] Individual 3D face synthesis based on orthogonal photos and speech-driven facial animation
    Shan, SG
    Gao, W
    Yan, J
    Zhang, HM
    Chen, XL
    2000 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL III, PROCEEDINGS, 2000, : 238 - 241
  • [40] FaceDiffuser: Speech-Driven 3D Facial Animation Synthesis Using Diffusion
    Stan, Stefan
    Haque, Kazi Injamamul
    Yumak, Zerrin
    15TH ANNUAL ACM SIGGRAPH CONFERENCE ON MOTION, INTERACTION AND GAMES, MIG 2023, 2023,