Empirical Study Incorporating Linguistic Knowledge on Filled Pauses for Personalized Spontaneous Speech Synthesis

被引：0

作者：

Matsunaga, Yuta ^{[1
]}

Saeki, Takaaki ^{[1
]}

Takamichi, Shinnosuke ^{[1
]}

Saruwatari, Hiroshi ^{[1
]}

机构：

[1] Univ Tokyo, Grad Sch Informat Sci & Technol, Tokyo, Japan

来源：

PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC) | 2022年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present a comprehensive empirical study for personalized spontaneous speech synthesis on the basis of linguistic knowledge. With the advent of voice cloning for reading-style speech synthesis, a new voice cloning paradigm for human-like and spontaneous speech synthesis is required. We, therefore, focus on personalized spontaneous speech synthesis that can clone both the individual's voice timbre and speech disfluency. Specifically, we deal with filled pauses, a major source of speech disfluency, which is known to play an important role in speech generation and communication in psychology and linguistics. To comparatively evaluate personalized filled pause insertion and non-personalized filled pause prediction methods, we developed a speech synthesis method with a non-personalized external filled pause predictor trained with a multi-speaker corpus. The results clarify the position-word entanglement of filled pauses, i.e., the necessity of precisely predicting positions for naturalness and the necessity of precisely predicting words for individuality on the evaluation of synthesized speech.

引用

页码：1898 / 1903

页数：6

共 33 条

[1] Automatic identification of filled pauses in spontaneous speech
O'Shaughnessy, D
Gabrea, M
2000 CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, CONFERENCE PROCEEDINGS, VOLS 1 AND 2: NAVIGATING TO A NEW ERA, 2000, : 620 - 624
[2] Modeling filled pauses for spontaneous speech recognition applications
Zgank, Andrej
Rotovnik, Tomaz
Maucec, Mirjam Sepesy
AEE' 08: PROCEEDINGS OF THE 7TH WSEAS INTERNATIONAL CONFERENCE ON APPLICATION OF ELECTRICAL ENGINEERING, 2008, : 42 - +
[3] Occurrences and Durations of Filled Pauses in Relation to Words and Silent Pauses in Spontaneous Speech
Gosy, Maria
LANGUAGES, 2023, 8 (01)
[4] Filled pauses in speech synthesis: Towards conversational speech
Adell, Jordi
Bonafonte, Antonio
Escudero, David
TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2007, 4629 : 358 - +
[5] SYNTHESIS OF FILLED PAUSES BASED ON A DISFLUENT SPEECH MODEL
Adell, Jordi
Bonafonte, Antonio
Escudero, David
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4810 - 4813
[6] Entrainment in spontaneous speech: the case of filled pauses in Supreme Court hearings
Benus, Stefan
Levitan, Rivka
Hirschberg, Julia
3RD IEEE INTERNATIONAL CONFERENCE ON COGNITIVE INFOCOMMUNICATIONS (COGINFOCOM 2012), 2012, : 793 - 797
[7] THE USE OF ACOUSTICALLY DETECTED FILLED AND SILENT PAUSES IN SPONTANEOUS SPEECH RECOGNITION
Ogata, Jun
Goto, Masataka
Itou, Katunobu
2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4305 - +
[8] Detecting Filled Pauses and Lengthenings in Russian Spontaneous Speech Using SVM
Verkhodanova, Vasilisa
Shapranov, Vladimir
SPEECH AND COMPUTER, 2016, 9811 : 224 - 231
[9] The Linguistic Advantage of the Intellectually Gifted Child: An Empirical Study of Spontaneous Speech
Hoh, Pau-San
ROEPER REVIEW-A JOURNAL ON GIFTED EDUCATION, 2005, 27 (03): : 178 - 185
[10] Acoustic feature analysis and discriminative modeling of filled pauses for spontaneous speech recognition
Wu, CH
Yan, GL
JOURNAL OF VLSI SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2004, 36 (2-3): : 91 - 104

← 1 2 3 4 →