Narrator or Character: Voice Modulation in an Expressive Multi-speaker TTS

被引：1

作者：

Kalyan, T. Pavan ^{[1
]}

Rao, Preeti ^{[1
]}

Jyothi, Preethi ^{[1
]}

Bhattacharyya, Pushpak ^{[1
]}

机构：

[1] Indian Inst Technol, Mumbai, Maharashtra, India

来源：

INTERSPEECH 2023 | 2023年

关键词：

Expressive TTS; speech synthesis; new TTS corpus; prosody modelling;

D O I：

10.21437/Interspeech.2023-2469

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Current Text-to-Speech (TTS) systems are trained on audiobook data and perform well in synthesizing read-style speech. In this work, we are interested in synthesizing audio stories as narrated to children. The storytelling style is more expressive and requires perceptible changes of voice across the narrator and story characters. To address these challenges, we present a new TTS corpus of English audio stories for children with 32.7 hours of speech by a single female speaker with a UK accent. We provide evidence of the salient differences in the suprasegmentals of the narrator and character utterances in the dataset, motivating the use of a multi-speaker TTS for our application. We use a fine-tuned BERT model to label each sentence as being spoken by a narrator or character that is subsequently used to condition the TTS output. Experiments show our new TTS system is superior in expressiveness in both A-B preference and MOS testing compared to reading-style TTS and single-speaker TTS.

引用

页码：4808 / 4812

页数：5

共 50 条

[1] LIMMITS'24: Multi-Speaker, Multi-Lingual INDIC TTS With Voice Cloning
Udupa, Sathvik
Bandekar, Jesuraja
Singh, Abhayjeet
Deekshitha, G.
Kumar, Saurabh
Badiger, Sandhya
Nagireddi, Amala
Roopa, R.
Ghosh, Prasanta Kumar
Murthy, Hema A.
Kumar, Pranaw
Tokuda, Keiichi
Hasegawa-Johnson, Mark
Olbrich, Philipp
IEEE OPEN JOURNAL OF SIGNAL PROCESSING, 2025, 6 : 293 - 302
[2] CAN WE USE COMMON VOICE TO TRAIN A MULTI-SPEAKER TTS SYSTEM?
Ogun, Sewade
Colotte, Vincent
Vincent, Emmanuel
2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 900 - 905
[3] LIMMITS'24: MULTI-SPEAKER, MULTI-LINGUAL INDIC TTS WITH VOICE CLONING<bold> </bold>
Singh, Abhayjeet
Nagireddi, Amala
Deekshitha, G.
Bandekar, Jesuraja
Roopa, R.
Badiger, Sandhya
Udupa, Sathvik
Ghosh, Prasanta Kumar
Murthy, Hema A.
Kumar, Pranaw
Tokuda, Keiichi
Hasegawa-Johnson, Mark
Olbrich, Philipp
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 61 - 62
[4] Unsupervised Speaker and Expression Factorization for Multi-Speaker Expressive Synthesis of Ebooks
Chen, Langzhou
Braunschweiler, Norbert
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1041 - 1045
[5] MULTI-SPEAKER MODELING AND SPEAKER ADAPTATION FOR DNN-BASED TTS SYNTHESIS
Fan, Yuchen
Qian, Yao
Soong, Frank K.
He, Lei
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4475 - 4479
[6] Can Speaker Augmentation Improve Multi-Speaker End-to-End TTS?
Cooper, Erica
Lai, Cheng-, I
Yasuda, Yusuke
Yamagishi, Junichi
INTERSPEECH 2020, 2020, : 3979 - 3983
[7] Enhancing Zero-Shot Multi-Speaker TTS with Negated Speaker Representations
Jeon, Yejin
Kim, Yunsu
Lee, Gary Geunbae
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 18336 - 18344
[8] Comparative Study for Multi-Speaker Mongolian TTS with a New Corpus
Liang, Kailin
Liu, Bin
Hu, Yifan
Liu, Rui
Bao, Feilong
Gao, Guanglai
APPLIED SCIENCES-BASEL, 2023, 13 (07):
[9] Multi-speaker voice cryptographic key generation
Paola Garcia-Perera, L.
Carlos Mex-Perera, J.
Nolazco-Flores, Juan A.
3RD ACS/IEEE INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, 2005, 2005,
[10] Multi-speaker Beamforming for Voice Activity Classification
Tran, Thuy N.
Cowley, William
Pollok, Andre
2013 AUSTRALIAN COMMUNICATIONS THEORY WORKSHOP (AUSCTW), 2013, : 116 - 121

← 1 2 3 4 5 →