Empirical Study Incorporating Linguistic Knowledge on Filled Pauses for Personalized Spontaneous Speech Synthesis

被引:0
|
作者
Matsunaga, Yuta [1 ]
Saeki, Takaaki [1 ]
Takamichi, Shinnosuke [1 ]
Saruwatari, Hiroshi [1 ]
机构
[1] Univ Tokyo, Grad Sch Informat Sci & Technol, Tokyo, Japan
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a comprehensive empirical study for personalized spontaneous speech synthesis on the basis of linguistic knowledge. With the advent of voice cloning for reading-style speech synthesis, a new voice cloning paradigm for human-like and spontaneous speech synthesis is required. We, therefore, focus on personalized spontaneous speech synthesis that can clone both the individual's voice timbre and speech disfluency. Specifically, we deal with filled pauses, a major source of speech disfluency, which is known to play an important role in speech generation and communication in psychology and linguistics. To comparatively evaluate personalized filled pause insertion and non-personalized filled pause prediction methods, we developed a speech synthesis method with a non-personalized external filled pause predictor trained with a multi-speaker corpus. The results clarify the position-word entanglement of filled pauses, i.e., the necessity of precisely predicting positions for naturalness and the necessity of precisely predicting words for individuality on the evaluation of synthesized speech.
引用
收藏
页码:1898 / 1903
页数:6
相关论文
共 33 条
  • [1] Automatic identification of filled pauses in spontaneous speech
    O'Shaughnessy, D
    Gabrea, M
    2000 CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, CONFERENCE PROCEEDINGS, VOLS 1 AND 2: NAVIGATING TO A NEW ERA, 2000, : 620 - 624
  • [2] Modeling filled pauses for spontaneous speech recognition applications
    Zgank, Andrej
    Rotovnik, Tomaz
    Maucec, Mirjam Sepesy
    AEE' 08: PROCEEDINGS OF THE 7TH WSEAS INTERNATIONAL CONFERENCE ON APPLICATION OF ELECTRICAL ENGINEERING, 2008, : 42 - +
  • [3] Occurrences and Durations of Filled Pauses in Relation to Words and Silent Pauses in Spontaneous Speech
    Gosy, Maria
    LANGUAGES, 2023, 8 (01)
  • [4] Filled pauses in speech synthesis: Towards conversational speech
    Adell, Jordi
    Bonafonte, Antonio
    Escudero, David
    TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2007, 4629 : 358 - +
  • [5] SYNTHESIS OF FILLED PAUSES BASED ON A DISFLUENT SPEECH MODEL
    Adell, Jordi
    Bonafonte, Antonio
    Escudero, David
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4810 - 4813
  • [6] Entrainment in spontaneous speech: the case of filled pauses in Supreme Court hearings
    Benus, Stefan
    Levitan, Rivka
    Hirschberg, Julia
    3RD IEEE INTERNATIONAL CONFERENCE ON COGNITIVE INFOCOMMUNICATIONS (COGINFOCOM 2012), 2012, : 793 - 797
  • [7] THE USE OF ACOUSTICALLY DETECTED FILLED AND SILENT PAUSES IN SPONTANEOUS SPEECH RECOGNITION
    Ogata, Jun
    Goto, Masataka
    Itou, Katunobu
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4305 - +
  • [8] Detecting Filled Pauses and Lengthenings in Russian Spontaneous Speech Using SVM
    Verkhodanova, Vasilisa
    Shapranov, Vladimir
    SPEECH AND COMPUTER, 2016, 9811 : 224 - 231
  • [9] The Linguistic Advantage of the Intellectually Gifted Child: An Empirical Study of Spontaneous Speech
    Hoh, Pau-San
    ROEPER REVIEW-A JOURNAL ON GIFTED EDUCATION, 2005, 27 (03): : 178 - 185
  • [10] Acoustic feature analysis and discriminative modeling of filled pauses for spontaneous speech recognition
    Wu, CH
    Yan, GL
    JOURNAL OF VLSI SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2004, 36 (2-3): : 91 - 104