Joint Learning of Facial Expression and Head Pose from Speech

被引:16
|
作者
Greenwood, David [1 ]
Matthews, Iain [1 ]
Laycock, Stephen [1 ]
机构
[1] Univ East Anglia, Sch Comp Sci, Norwich, Norfolk, England
关键词
Speech Animation; Deep Learning; LSTM; BLSTM; RNN; Audiovisual Speech; Shape Modelling; Lip Sync; Uncanny Valley; Visual Prosody;
D O I
10.21437/Interspeech.2018-2587
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Natural movement plays a significant role in realistic speech animation, and numerous studies have demonstrated the contribution visual cues make to the degree human observers find an animation acceptable. Natural, expressive, emotive, and prosodic speech exhibits motion patterns that are difficult to predict with considerable variation in visual modalities. Recently, there have been some impressive demonstrations of face animation derived in some way from the speech signal. Each of these methods have taken unique approaches, but none have included rigid head pose in their predicted output. We observe a high degree of correspondence with facial activity and rigid head pose during speech, and exploit this observation to jointly learn full face animation and head pose rotation and translation combined. From our own corpus, we train Deep Bi-Directional LSTMs (BLSTM) capable of learning long-term structure in language to model the relationship that speech has with the complex activity of the face. We define a model architecture to encourage learning of rigid head motion via the latent space of the speaker's facial activity. The result is a model that can predict lip sync and other facial motion along with rigid head motion directly from audible speech.
引用
收藏
页码:2484 / 2488
页数:5
相关论文
共 50 条
  • [1] Towards unsupervised learning of joint facial landmark detection and head pose estimation
    Zou, Zhiming
    Jia, Dian
    Tang, Wei
    PATTERN RECOGNITION, 2025, 162
  • [2] Joint head pose and facial landmark regression from depth images
    Wang J.
    Zhang J.
    Luo C.
    Chen F.
    Computational Visual Media, 2017, 3 (3) : 229 - 241
  • [3] Joint head pose and facial landmark regression from depth images
    Jie Wang
    Juyong Zhang
    Changwei Luo
    Falai Chen
    Computational Visual Media, 2017, 3 (03) : 229 - 241
  • [4] Joint Pose and Expression Modeling for Facial Expression Recognition
    Zhang, Feifei
    Zhang, Tianzhu
    Mao, Qirong
    Xu, Changsheng
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 3359 - 3368
  • [5] Head Pose and Expression Transfer using Facial Status Score
    Hosoi, Tomoki
    2017 12TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2017), 2017, : 573 - 580
  • [6] MANIFOLD LEARNING FOR SIMULTANEOUS POSE AND FACIAL EXPRESSION RECOGNITION
    Ptucha, Raymond
    Tsagkatakis, Grigorios
    Savakis, Andreas
    2011 18TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2011,
  • [7] An Entertainment Robot Based on Head Pose Estimation and Facial Expression Recognition
    Takahashi, Koichi
    Mitsukura, Yasue
    2012 PROCEEDINGS OF SICE ANNUAL CONFERENCE (SICE), 2012, : 2057 - 2061
  • [8] Pose-robust feature learning for facial expression recognition
    Feifei Zhang
    Yongbin Yu
    Qirong Mao
    Jianping Gou
    Yongzhao Zhan
    Frontiers of Computer Science, 2016, 10 : 832 - 844
  • [9] Pose-robust feature learning for facial expression recognition
    Zhang, Feifei
    Yu, Yongbin
    Mao, Qirong
    Gou, Jianping
    Zhan, Yongzhao
    FRONTIERS OF COMPUTER SCIENCE, 2016, 10 (05) : 832 - 844
  • [10] Pose-robust feature learning for facial expression recognition
    Feifei ZHANG
    Yongbin YU
    Qirong MAO
    Jianping GOU
    Yongzhao ZHAN
    Frontiers of Computer Science, 2016, 10 (05) : 832 - 844