Joint Learning of Facial Expression and Head Pose from Speech

被引:16
|
作者
Greenwood, David [1 ]
Matthews, Iain [1 ]
Laycock, Stephen [1 ]
机构
[1] Univ East Anglia, Sch Comp Sci, Norwich, Norfolk, England
关键词
Speech Animation; Deep Learning; LSTM; BLSTM; RNN; Audiovisual Speech; Shape Modelling; Lip Sync; Uncanny Valley; Visual Prosody;
D O I
10.21437/Interspeech.2018-2587
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Natural movement plays a significant role in realistic speech animation, and numerous studies have demonstrated the contribution visual cues make to the degree human observers find an animation acceptable. Natural, expressive, emotive, and prosodic speech exhibits motion patterns that are difficult to predict with considerable variation in visual modalities. Recently, there have been some impressive demonstrations of face animation derived in some way from the speech signal. Each of these methods have taken unique approaches, but none have included rigid head pose in their predicted output. We observe a high degree of correspondence with facial activity and rigid head pose during speech, and exploit this observation to jointly learn full face animation and head pose rotation and translation combined. From our own corpus, we train Deep Bi-Directional LSTMs (BLSTM) capable of learning long-term structure in language to model the relationship that speech has with the complex activity of the face. We define a model architecture to encourage learning of rigid head motion via the latent space of the speaker's facial activity. The result is a model that can predict lip sync and other facial motion along with rigid head motion directly from audible speech.
引用
收藏
页码:2484 / 2488
页数:5
相关论文
共 50 条
  • [41] Joint modeling of facial expression and shape from video
    Tamminen, T
    Kätsyri, J
    Frydrych, M
    Lampinen, J
    IMAGE ANALYSIS, PROCEEDINGS, 2005, 3540 : 151 - 160
  • [42] Towards Pain Monitoring: Facial Expression, Head Pose, a new Database, an Automatic System and Remaining Challenges
    Werner, Philipp
    Al-Hamadi, Ayoub
    Niese, Robert
    Walter, Steffen
    Gruss, Sascha
    Traue, Harald C.
    PROCEEDINGS OF THE BRITISH MACHINE VISION CONFERENCE 2013, 2013,
  • [43] Joint Sparse Learning for 3-D Facial Expression Generation
    Song, Mingli
    Tao, Dacheng
    Sun, Shengpeng
    Chen, Chun
    Bu, Jiajun
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2013, 22 (08) : 3283 - 3295
  • [44] A convolution-transformer dual branch network for head-pose and occlusion facial expression recognition
    Liang, Xingcan
    Xu, Linsen
    Zhang, Wenxiang
    Zhang, Yan
    Liu, Jinfu
    Liu, Zhipeng
    VISUAL COMPUTER, 2023, 39 (06): : 2277 - 2290
  • [45] A convolution-transformer dual branch network for head-pose and occlusion facial expression recognition
    Xingcan Liang
    Linsen Xu
    Wenxiang Zhang
    Yan Zhang
    Jinfu Liu
    Zhipeng Liu
    The Visual Computer, 2023, 39 (6) : 2277 - 2290
  • [46] A joint learning method for low-light facial expression recognition
    Xie, Yuanlun
    Ou, Jie
    Wen, Bihan
    Yu, Zitong
    Tian, Wenhong
    COMPLEX & INTELLIGENT SYSTEMS, 2025, 11 (02)
  • [47] The implementation of the emotion recognition from speech and facial expression system
    Park, CH
    Byun, KS
    Sim, KB
    ADVANCES IN NATURAL COMPUTATION, PT 2, PROCEEDINGS, 2005, 3611 : 85 - 88
  • [48] A probabilistic framework for joint head tracking and pose estimation
    Ba, SO
    Odobez, JM
    PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 4, 2004, : 264 - 267
  • [49] Automatic Emotion Recognition for Facial Expression Animation from Speech
    Bozkurt, Elif
    Erzin, Engin
    Erdem, Cigdem Eroglu
    Erdem, A. Tanju
    2009 IEEE 17TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, VOLS 1 AND 2, 2009, : 271 - +
  • [50] JOINT ESTIMATION OF HEAD POSE AND VISUAL FOCUS OF ATTENTION
    Huang, Yingning
    Duan, Dingrui
    Cui, Jinshi
    Davoine, Franck
    Wang, Li
    Zha, Hongbin
    2014 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2014, : 3332 - 3336