Expressive Visual Text-To-Speech Using Active Appearance Models

被引:40
|
作者
Anderson, Robert [1 ]
Stenger, Bjoern [2 ]
Wan, Vincent [2 ]
Cipolla, Roberto [1 ]
机构
[1] Univ Cambridge, Dept Engn, Cambridge, England
[2] Toshiba Res Europe, Cambridge, England
来源
2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2013年
关键词
SYNTHETIC TALKING FACES;
D O I
10.1109/CVPR.2013.434
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a complete system for expressive visual text-to-speech (VTTS), which is capable of producing expressive output, in the form of a 'talking head', given an input text and a set of continuous expression weights. The face is modeled using an active appearance model (AAM), and several extensions are proposed which make it more applicable to the task of VTTS. The model allows for normalization with respect to both pose and blink state which significantly reduces artifacts in the resulting synthesized sequences. We demonstrate quantitative improvements in terms of reconstruction error over a million frames, as well as in large-scale user studies, comparing the output of different systems.
引用
收藏
页码:3382 / 3389
页数:8
相关论文
共 50 条
  • [31] Using text-to-speech processors in embedded applications
    DR Ibrahim, Dogan, 1600, Nexus Media Communications Ltd. (123):
  • [32] TEXT-TO-SPEECH USING LPC ALLOPHONE STRINGING
    LIN, KS
    GOUDIE, KM
    FRANTZ, GA
    BRANTINGHAM, GL
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 1981, 27 (02) : 144 - 152
  • [33] Using Text-to-Speech to Prototype Game Dialog
    Engstrom, Henrik
    Ostblad, Per Anders
    COMPUTERS IN ENTERTAINMENT, 2018, 16 (04):
  • [34] ON VISUAL OBJECT TRACKING USING ACTIVE APPEARANCE MODELS
    Hoffmann, M. R.
    Herbst, B. M.
    Hunter, K. M.
    SAIEE AFRICA RESEARCH JOURNAL, 2007, 98 (02): : 52 - 58
  • [35] CAMNet: A controllable acoustic model for efficient, expressive, high-quality text-to-speech
    Alvarez, Jesus Monge
    Francois, Holly
    Sung, Hosang
    Choi, Seungdo
    Jeong, Jonghoon
    Choo, Kihyun
    Min, Kyoungbo
    Park, Sangjun
    APPLIED ACOUSTICS, 2022, 186
  • [36] NORMALIZATION OF TEXT MESSAGES FOR TEXT-TO-SPEECH
    Pennell, Deana L.
    Liu, Yang
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4842 - 4845
  • [37] Text and Speech Corpora for Text-To-Speech Synthesis of Tales
    Doukhan, David
    Rosset, Sophie
    Rilliard, Albert
    d'Alessandro, Christophe
    Adda-Decker, Martine
    LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 1003 - 1010
  • [38] TSP-TTS: Text-based Style Predictor with Residual Vector Quantization for Expressive Text-to-Speech
    Seong, Donghyun
    Lee, Hoyoung
    Chang, Joon-Hyuk
    INTERSPEECH 2024, 2024, : 1780 - 1784
  • [39] Using text-to-speech processors in intelligent embedded designs
    Ibrahim, Dogan
    ELECTRONICS WORLD, 2016, 122 (1964): : 14 - 16
  • [40] JAPANESE TEXT-TO-SPEECH SYNTHESIZER
    NAGAKURA, K
    HAKODA, K
    KABEYA, K
    REVIEW OF THE ELECTRICAL COMMUNICATIONS LABORATORIES, 1988, 36 (05): : 451 - 457