Expressive Visual Text-To-Speech Using Active Appearance Models

被引:40
|
作者
Anderson, Robert [1 ]
Stenger, Bjoern [2 ]
Wan, Vincent [2 ]
Cipolla, Roberto [1 ]
机构
[1] Univ Cambridge, Dept Engn, Cambridge, England
[2] Toshiba Res Europe, Cambridge, England
来源
2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2013年
关键词
SYNTHETIC TALKING FACES;
D O I
10.1109/CVPR.2013.434
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a complete system for expressive visual text-to-speech (VTTS), which is capable of producing expressive output, in the form of a 'talking head', given an input text and a set of continuous expression weights. The face is modeled using an active appearance model (AAM), and several extensions are proposed which make it more applicable to the task of VTTS. The model allows for normalization with respect to both pose and blink state which significantly reduces artifacts in the resulting synthesized sequences. We demonstrate quantitative improvements in terms of reconstruction error over a million frames, as well as in large-scale user studies, comparing the output of different systems.
引用
收藏
页码:3382 / 3389
页数:8
相关论文
共 50 条
  • [1] Expressive Text-to-Speech using Style Tag
    Kim, Minchan
    Cheon, Sung Jun
    Choi, Byoung Jin
    Kim, Jong Jin
    Kim, Nam Soo
    INTERSPEECH 2021, 2021, : 4663 - 4667
  • [2] Expressive visual text-to-speech as an assistive technology for individuals with autism spectrum conditions
    Cassidy, S. A.
    Stenger, B.
    Van Dongen, L.
    Yanagisawa, K.
    Anderson, R.
    Wan, V.
    Baron-Cohen, S.
    Cipolla, R.
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2016, 148 : 193 - 200
  • [3] ON GRANULARITY OF PROSODIC REPRESENTATIONS IN EXPRESSIVE TEXT-TO-SPEECH
    Babianski, Mikolaj
    Pokora, Kamil
    Shah, Raahil
    Sienkiewicz, Rafal
    Korzekwa, Daniel
    Klimkov, Viacheslav
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 892 - 899
  • [4] Modeling the Acoustic Correlates of Expressive Elements in Text Genres for Expressive Text-to-Speech Synthesis
    Yang, Hongwu
    Meng, Helen M.
    Cai, Lianhong
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1806 - 1809
  • [5] LOW-RESOURCE EXPRESSIVE TEXT-TO-SPEECH USING DATA AUGMENTATION
    Huybrechts, Goeric
    Merritt, Thomas
    Comini, Giulia
    Perz, Bartek
    Shah, Raahil
    Lorenzo-Trueba, Jaime
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6593 - 6597
  • [6] Expressive Text-to-Speech Synthesis using Text Chat Dataset with Speaking Style Information
    Homma Y.
    Kanagawa H.
    Kobayashi N.
    Ijima Y.
    Saito K.
    Transactions of the Japanese Society for Artificial Intelligence, 2023, 38 (03)
  • [7] ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading
    Xiao, Yujia
    Zhang, Shaofei
    Wang, Xi
    Tan, Xu
    He, Lei
    Zhao, Sheng
    Soong, Frank K.
    Lee, Tan
    INTERSPEECH 2023, 2023, : 4883 - 4887
  • [8] Speech Modification for Prosody Conversion in Expressive Marathi Text-to-Speech Synthesis
    Anil, Manjare Chandraprabha
    Shirbahadurkar, S. D.
    2014 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN), 2014, : 56 - 58
  • [9] USING VAES AND NORMALIZING FLOWS FOR ONE-SHOT TEXT-TO-SPEECH SYNTHESIS OF EXPRESSIVE SPEECH
    Aggarwal, Vatsal
    Cotescu, Marius
    Prateek, Nishant
    Lorenzo-Trueba, Jaime
    Barra-Chicote, Roberto
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6179 - 6183
  • [10] Exploiting Emotion Information in Speaker Embeddings for Expressive Text-to-Speech
    Shaheen, Zein
    Sadekova, Tasnima
    Matveeva, Yulia
    Shirshova, Alexandra
    Kudinov, Mikhail
    INTERSPEECH 2023, 2023, : 2038 - 2042