Expressive Visual Text-To-Speech Using Active Appearance Models

被引:40
|
作者
Anderson, Robert [1 ]
Stenger, Bjoern [2 ]
Wan, Vincent [2 ]
Cipolla, Roberto [1 ]
机构
[1] Univ Cambridge, Dept Engn, Cambridge, England
[2] Toshiba Res Europe, Cambridge, England
来源
2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2013年
关键词
SYNTHETIC TALKING FACES;
D O I
10.1109/CVPR.2013.434
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a complete system for expressive visual text-to-speech (VTTS), which is capable of producing expressive output, in the form of a 'talking head', given an input text and a set of continuous expression weights. The face is modeled using an active appearance model (AAM), and several extensions are proposed which make it more applicable to the task of VTTS. The model allows for normalization with respect to both pose and blink state which significantly reduces artifacts in the resulting synthesized sequences. We demonstrate quantitative improvements in terms of reconstruction error over a million frames, as well as in large-scale user studies, comparing the output of different systems.
引用
收藏
页码:3382 / 3389
页数:8
相关论文
共 50 条
  • [21] Visual tracking using active appearance models
    Birkbeck, N
    Jagersand, M
    1ST CANADIAN CONFERENCE ON COMPUTER AND ROBOT VISION, PROCEEDINGS, 2004, : 2 - 9
  • [22] LLM-based Expressive Text-to-Speech Synthesizer with Style and Timbre disentanglement
    Zhu, Yuanyuan
    He, Jiaxu
    Jing, Ruihao
    Song, Yaodong
    Lian, Jie
    Zhang, Xiao-Lei
    Li, Jie
    2024 IEEE 14TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, ISCSLP 2024, 2024, : 596 - 600
  • [23] DETECTION AND EMPHATIC REALIZATION OF CONTRASTIVE WORD PAIRS FOR EXPRESSIVE TEXT-TO-SPEECH SYNTHESIS
    Li, Chunrong
    Wu, Zhiyong
    Meng, Fanbo
    Meng, Helen
    Cai, Lianhong
    2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : 93 - 97
  • [24] EXPRESSIVE VISUAL TEXT TO SPEECH AND EXPRESSION ADAPTATION USING DEEP NEURAL NETWORKS
    Parker, Jonathan
    Maia, Ranniery
    Stylianou, Yannis
    Cipolla, Roberto
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4920 - 4924
  • [25] TEXT-TO-SPEECH SYNTHESIS
    SPROAT, RW
    OLIVE, JP
    AT&T TECHNICAL JOURNAL, 1995, 74 (02): : 35 - 44
  • [26] Text to visual synthesis with appearance models
    Melenehón, I
    de la Torre, F
    Iriondo, I
    Alías, F
    Martínez, E
    Vicent, H
    2003 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL 1, PROCEEDINGS, 2003, : 237 - 240
  • [27] The Art of Text-to-Speech
    Lindquist, Benjamin
    CRITICAL INQUIRY, 2024, 50 (02) : 225 - 251
  • [28] Text-to-speech for customers
    不详
    EXPERT SYSTEMS, 1998, 15 (01) : 66 - 66
  • [29] Software text-to-speech
    Hallahan W.I.
    International Journal of Speech Technology, 1997, 1 (2) : 121 - 134
  • [30] Using text-to-speech processors in embedded applications
    Ibrahim, Dogan
    ELECTRONICS WORLD, 2017, 123 (1975): : 14 - 16