Expressive Visual Text-To-Speech Using Active Appearance Models

被引：40

作者：

Anderson, Robert ^{[1
]}

Stenger, Bjoern ^{[2
]}

Wan, Vincent ^{[2
]}

Cipolla, Roberto ^{[1
]}

机构：

[1] Univ Cambridge, Dept Engn, Cambridge, England

[2] Toshiba Res Europe, Cambridge, England

来源：

2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2013年

关键词：

SYNTHETIC TALKING FACES;

D O I：

10.1109/CVPR.2013.434

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper presents a complete system for expressive visual text-to-speech (VTTS), which is capable of producing expressive output, in the form of a 'talking head', given an input text and a set of continuous expression weights. The face is modeled using an active appearance model (AAM), and several extensions are proposed which make it more applicable to the task of VTTS. The model allows for normalization with respect to both pose and blink state which significantly reduces artifacts in the resulting synthesized sequences. We demonstrate quantitative improvements in terms of reconstruction error over a million frames, as well as in large-scale user studies, comparing the output of different systems.

引用

页码：3382 / 3389

页数：8

共 50 条

[1] Expressive Text-to-Speech using Style Tag
Kim, Minchan
Cheon, Sung Jun
Choi, Byoung Jin
Kim, Jong Jin
Kim, Nam Soo
INTERSPEECH 2021, 2021, : 4663 - 4667
[2] Expressive visual text-to-speech as an assistive technology for individuals with autism spectrum conditions
Cassidy, S. A.
Stenger, B.
Van Dongen, L.
Yanagisawa, K.
Anderson, R.
Wan, V.
Baron-Cohen, S.
Cipolla, R.
COMPUTER VISION AND IMAGE UNDERSTANDING, 2016, 148 : 193 - 200
[3] ON GRANULARITY OF PROSODIC REPRESENTATIONS IN EXPRESSIVE TEXT-TO-SPEECH
Babianski, Mikolaj
Pokora, Kamil
Shah, Raahil
Sienkiewicz, Rafal
Korzekwa, Daniel
Klimkov, Viacheslav
2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 892 - 899
[4] Modeling the Acoustic Correlates of Expressive Elements in Text Genres for Expressive Text-to-Speech Synthesis
Yang, Hongwu
Meng, Helen M.
Cai, Lianhong
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1806 - 1809
[5] LOW-RESOURCE EXPRESSIVE TEXT-TO-SPEECH USING DATA AUGMENTATION
Huybrechts, Goeric
Merritt, Thomas
Comini, Giulia
Perz, Bartek
Shah, Raahil
Lorenzo-Trueba, Jaime
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6593 - 6597
[6] Expressive Text-to-Speech Synthesis using Text Chat Dataset with Speaking Style Information
Homma Y.
Kanagawa H.
Kobayashi N.
Ijima Y.
Saito K.
Transactions of the Japanese Society for Artificial Intelligence, 2023, 38 (03)
[7] ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading
Xiao, Yujia
Zhang, Shaofei
Wang, Xi
Tan, Xu
He, Lei
Zhao, Sheng
Soong, Frank K.
Lee, Tan
INTERSPEECH 2023, 2023, : 4883 - 4887
[8] Speech Modification for Prosody Conversion in Expressive Marathi Text-to-Speech Synthesis
Anil, Manjare Chandraprabha
Shirbahadurkar, S. D.
2014 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN), 2014, : 56 - 58
[9] USING VAES AND NORMALIZING FLOWS FOR ONE-SHOT TEXT-TO-SPEECH SYNTHESIS OF EXPRESSIVE SPEECH
Aggarwal, Vatsal
Cotescu, Marius
Prateek, Nishant
Lorenzo-Trueba, Jaime
Barra-Chicote, Roberto
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6179 - 6183
[10] Exploiting Emotion Information in Speaker Embeddings for Expressive Text-to-Speech
Shaheen, Zein
Sadekova, Tasnima
Matveeva, Yulia
Shirshova, Alexandra
Kudinov, Mikhail
INTERSPEECH 2023, 2023, : 2038 - 2042

← 1 2 3 4 5 →