Rigid head motion in expressive speech animation: Analysis and synthesis

被引：104

作者：

Busso, Carlos ^{[1
]}

Deng, Zhigang ^{[1
]}

Grimm, Michael ^{[1
]}

Neumann, Ulrich ^{[1
]}

Narayanan, Shrikanth ^{[1
]}

机构：

[1] Univ So Calif, Viterbi Sch Engn, Integrated Media Syst Ctr, Los Angeles, CA 90089 USA

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2007年 / 15卷 / 03期

基金：

美国国家科学基金会;

关键词：

D O I：

10.1109/TASL.2006.885910

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Rigid head motion is a gesture that conveys important nonverbal information in human communication, and hence it needs to be appropriately modeled and included in realistic facial animations to effectively mimic human behaviors. In this paper, head motion sequences in expressive facial animations are analyzed in terms of their naturalness and emotional salience in perception. Statistical measures are derived from an audiovisual database, comprising synchronized facial gestures and speech, which revealed characteristic patterns in emotional head motion sequences. Head motion patterns with neutral speech significantly differ from head motion patterns with emotional speech in motion activation, range, and velocity. The results show that head motion provides discriminating information about emotional categories. An approach to synthesize emotional head motion sequences driven by prosodic features is presented, expanding upon our previous framework on head motion synthesis. This method naturally models the specific temporal dynamics of emotional head motion sequences by building hidden Markov models for each emotional category (sadness, happiness, anger, and neutral state). Human raters were asked to assess the naturalness and the emotional content of the facial animations. On average, the synthesized head motion sequences were perceived even more natural than the original head motion sequences. The results also show that head motion modifies the emotional perception of the facial animation especially in the valence and activation domain. These results suggest that appropriate. head motion not only significantly improves the naturalness of the animation but can also be used to enhance the emotional content of the animation to effectively engage the users.

引用

页码：1075 / 1086

页数：12

共 50 条

[41] Head Motion Animation using Avatar Gaze Space
Ramaiah, M. S.
Vijay, Ankit
Sharma, Geetika
Mukerjee, Amitabha
2013 IEEE VIRTUAL REALITY CONFERENCE (VR), 2013, : 95 - +
[42] Controllable Emphatic Speech Synthesis based on Forward Attention for Expressive Speech Synthesis
Liu, Liangqi
Hu, Jiankun
Wu, Zhiyong
Yang, Song
Yang, Songfan
Jia, Jia
Meng, Helen
2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 410 - 414
[43] Low-Level Characterization of Expressive Head Motion Through Frequency Domain Analysis
Ding, Yu
Shi, Lei
Deng, Zhigang
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2020, 11 (03) : 405 - 418
[44] Prosody modelling of Spanish for expressive speech synthesis
Iriondo, Ignasi
Socoro, Joan Claudi
Alias, Francesc
2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 821 - +
[45] Specifying affect and emotion for expressive speech synthesis
Campbell, N
COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2004, 2945 : 395 - 406
[46] Editorial -: Special section on expressive speech synthesis
Campbell, Nick
Hamza, Wael
Hoege, Harald
Tao, Jianhua
Bailly, Gerard
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (04): : 1097 - 1098
[47] Synthesizing Expressive Facial and Speech Animation by Text-to-IPA Translation with Emotional Control
Stef, Andreea
Perera, Kaveen
Shum, Hubert P. H.
Ho, Edmond S. L.
2018 12TH INTERNATIONAL CONFERENCE ON SOFTWARE, KNOWLEDGE, INFORMATION MANAGEMENT & APPLICATIONS (SKIMA), 2018, : 39 - +
[48] Expressive speech synthesis using sentiment embeddings
Jauk, Igor
Lorenzo-Trueba, Jaime
Yamagishi, Junichi
Bonafonte, Antonio
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3062 - 3066
[49] Expressive facial speech synthesis on a robotic platform
Li, Xingyan
MacDonald, Bruce
Watson, Catherine I.
2009 IEEE-RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, 2009, : 5009 - 5014
[50] Voice Quality Modelling for Expressive Speech Synthesis
Monzo, Carlos
Iriondo, Ignasi
Socoro, Joan Claudi
SCIENTIFIC WORLD JOURNAL, 2014,

← 1 2 3 4 5 →