Joint Learning of Facial Expression and Head Pose from Speech

被引：16

作者：

Greenwood, David ^{[1
]}

Matthews, Iain ^{[1
]}

Laycock, Stephen ^{[1
]}

机构：

[1] Univ East Anglia, Sch Comp Sci, Norwich, Norfolk, England

来源：

19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES | 2018年

关键词：

Speech Animation; Deep Learning; LSTM; BLSTM; RNN; Audiovisual Speech; Shape Modelling; Lip Sync; Uncanny Valley; Visual Prosody;

D O I：

10.21437/Interspeech.2018-2587

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Natural movement plays a significant role in realistic speech animation, and numerous studies have demonstrated the contribution visual cues make to the degree human observers find an animation acceptable. Natural, expressive, emotive, and prosodic speech exhibits motion patterns that are difficult to predict with considerable variation in visual modalities. Recently, there have been some impressive demonstrations of face animation derived in some way from the speech signal. Each of these methods have taken unique approaches, but none have included rigid head pose in their predicted output. We observe a high degree of correspondence with facial activity and rigid head pose during speech, and exploit this observation to jointly learn full face animation and head pose rotation and translation combined. From our own corpus, we train Deep Bi-Directional LSTMs (BLSTM) capable of learning long-term structure in language to model the relationship that speech has with the complex activity of the face. We define a model architecture to encourage learning of rigid head motion via the latent space of the speaker's facial activity. The result is a model that can predict lip sync and other facial motion along with rigid head motion directly from audible speech.

引用

页码：2484 / 2488

页数：5

共 50 条

[41] Joint modeling of facial expression and shape from video
Tamminen, T
Kätsyri, J
Frydrych, M
Lampinen, J
IMAGE ANALYSIS, PROCEEDINGS, 2005, 3540 : 151 - 160
[42] Towards Pain Monitoring: Facial Expression, Head Pose, a new Database, an Automatic System and Remaining Challenges
Werner, Philipp
Al-Hamadi, Ayoub
Niese, Robert
Walter, Steffen
Gruss, Sascha
Traue, Harald C.
PROCEEDINGS OF THE BRITISH MACHINE VISION CONFERENCE 2013, 2013,
[43] Joint Sparse Learning for 3-D Facial Expression Generation
Song, Mingli
Tao, Dacheng
Sun, Shengpeng
Chen, Chun
Bu, Jiajun
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2013, 22 (08) : 3283 - 3295
[44] A convolution-transformer dual branch network for head-pose and occlusion facial expression recognition
Liang, Xingcan
Xu, Linsen
Zhang, Wenxiang
Zhang, Yan
Liu, Jinfu
Liu, Zhipeng
VISUAL COMPUTER, 2023, 39 (06): : 2277 - 2290
[45] A convolution-transformer dual branch network for head-pose and occlusion facial expression recognition
Xingcan Liang
Linsen Xu
Wenxiang Zhang
Yan Zhang
Jinfu Liu
Zhipeng Liu
The Visual Computer, 2023, 39 (6) : 2277 - 2290
[46] A joint learning method for low-light facial expression recognition
Xie, Yuanlun
Ou, Jie
Wen, Bihan
Yu, Zitong
Tian, Wenhong
COMPLEX & INTELLIGENT SYSTEMS, 2025, 11 (02)
[47] The implementation of the emotion recognition from speech and facial expression system
Park, CH
Byun, KS
Sim, KB
ADVANCES IN NATURAL COMPUTATION, PT 2, PROCEEDINGS, 2005, 3611 : 85 - 88
[48] A probabilistic framework for joint head tracking and pose estimation
Ba, SO
Odobez, JM
PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 4, 2004, : 264 - 267
[49] Automatic Emotion Recognition for Facial Expression Animation from Speech
Bozkurt, Elif
Erzin, Engin
Erdem, Cigdem Eroglu
Erdem, A. Tanju
2009 IEEE 17TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, VOLS 1 AND 2, 2009, : 271 - +
[50] JOINT ESTIMATION OF HEAD POSE AND VISUAL FOCUS OF ATTENTION
Huang, Yingning
Duan, Dingrui
Cui, Jinshi
Davoine, Franck
Wang, Li
Zha, Hongbin
2014 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2014, : 3332 - 3336

← 1 2 3 4 5 →