Expressive Visual Text-To-Speech Using Active Appearance Models

被引:40
|
作者
Anderson, Robert [1 ]
Stenger, Bjoern [2 ]
Wan, Vincent [2 ]
Cipolla, Roberto [1 ]
机构
[1] Univ Cambridge, Dept Engn, Cambridge, England
[2] Toshiba Res Europe, Cambridge, England
来源
2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2013年
关键词
SYNTHETIC TALKING FACES;
D O I
10.1109/CVPR.2013.434
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a complete system for expressive visual text-to-speech (VTTS), which is capable of producing expressive output, in the form of a 'talking head', given an input text and a set of continuous expression weights. The face is modeled using an active appearance model (AAM), and several extensions are proposed which make it more applicable to the task of VTTS. The model allows for normalization with respect to both pose and blink state which significantly reduces artifacts in the resulting synthesized sequences. We demonstrate quantitative improvements in terms of reconstruction error over a million frames, as well as in large-scale user studies, comparing the output of different systems.
引用
收藏
页码:3382 / 3389
页数:8
相关论文
共 50 条
  • [41] Multilingual text-to-speech synthesis
    Black, AW
    Lenzo, KA
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL III, PROCEEDINGS: IMAGE AND MULTIDIMENSIONAL SIGNAL PROCESSING SPECIAL SESSIONS, 2004, : 761 - 764
  • [42] Slovenian text-to-speech system
    Sef, T
    ISCAS 2000: IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS - PROCEEDINGS, VOL V: EMERGING TECHNOLOGIES FOR THE 21ST CENTURY, 2000, : 41 - 44
  • [43] Indonesian Text-To-Speech System Using Syllable Concatenation: Speech Optimization
    Mengko, Richard
    Ayuningtyas, Aulia
    PROCEEDINGS OF 2013 3RD INTERNATIONAL CONFERENCE ON INSTRUMENTATION, COMMUNICATIONS, INFORMATION TECHNOLOGY, AND BIOMEDICAL ENGINEERING (ICICI-BME), 2013, : 412 - 415
  • [44] WhiteboardVCR - a presentation tool using text-to-speech agents
    Chong, NST
    Tosukhowong, P
    Sakauchi, M
    IEEE INTERNATIONAL CONFERENCE ON ADVANCED LEARNING TECHNOLOGIES, PROCEEDINGS, 2001, : 319 - 322
  • [45] A Hakka text-to-speech system
    Yu, Hsiu-Min
    Hwang, Hsin-Te
    Lin, Dong-Yi
    Chen, Sin-Horng
    CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2006, 4274 : 241 - +
  • [46] TEXT-TO-SPEECH CONVERSION TECHNOLOGY
    OMALLEY, MH
    COMPUTER, 1990, 23 (08) : 17 - 23
  • [47] Improving text-to-speech synthesis
    Tatham, M
    Lewis, E
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1856 - 1859
  • [48] EPIC TTS Models: Empirical Pruning Investigations Characterizing Text-To-Speech Models
    Lam, Perry
    Zhang, Huayun
    Chen, Nancy F.
    Sisman, Berrak
    INTERSPEECH 2022, 2022, : 823 - 827
  • [49] Latvian Text-to-Speech Synthesizer
    Pinnis, Marcis
    Auzina, Ilze
    HUMAN LANGUAGE TECHNOLOGIES - THE BALTIC PERSPECTIVE, 2010, 219 : 69 - 72
  • [50] Towards Universal Text-to-Speech
    Yang, Jingzhou
    He, Lei
    INTERSPEECH 2020, 2020, : 3171 - 3175