A framework towards expressive speech analysis and synthesis with preliminary results

被引:1
|
作者
Raptis, Spyros [1 ]
Karabetsos, Sotiris [1 ,2 ]
Chalamandaris, Aimilios [1 ]
Tsiakoulis, Pirros [1 ]
机构
[1] Inst Language & Speech Proc, Athena Res Ctr, Voice & Sound Technol Dept, Athens 15125, Greece
[2] Technol Educ Inst TEI Athens, Dept Elect Engn, Athens 12243, Egaleo, Greece
关键词
Emotion classification; Emotional speech; Expressive speech; Text to speech; Acoustic analysis; Speech synthesis; MODELS;
D O I
10.1007/s12193-015-0186-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Emotion-aware computing presents one of the key challenges in contemporary natural human interaction research in which emotional speech is an essential modality in multimodal user interfaces. Speech modality relates mainly to speech emotion and affect recognition as well as near natural expressive speech synthesis, the latter being considered as one of the next significant milestones in speech synthesis technology. A common problem to recognizing as well as to generating affective and emotional speech content is the adopted methodology on emotion analysis and modeling. This work proposes a generalized framework for annotating, analyzing and modeling expressive speech in a data-driven machine learning approach, towards building expressive text to speech synthesis systems. To this end, the framework as well as the data driven methodology is described, comprised of the techniques and approaches for acoustic analysis and expression clustering. In addition, the deployment of online experimental tools for speech perception and annotation and the description of the utilized speech data together with initial experimental results are also given, depicting the potential of the proposed framework and providing encouraging indications for further research.
引用
收藏
页码:387 / 394
页数:8
相关论文
共 50 条
  • [1] A framework towards expressive speech analysis and synthesis with preliminary results
    Spyros Raptis
    Sotiris Karabetsos
    Aimilios Chalamandaris
    Pirros Tsiakoulis
    Journal on Multimodal User Interfaces, 2015, 9 : 387 - 394
  • [2] Towards Expressive Speech Synthesis: Analysis and Modeling of Expressive Speech
    Raptis, Spyros
    Karabetsos, Sotiris
    Chalamandaris, Aimilios
    Tsiakoulis, Pirros
    2014 5th IEEE Conference on Cognitive Infocommunications (CogInfoCom), 2014, : 461 - 465
  • [3] Towards Glottal Source Controllability in Expressive Speech Synthesis
    Lorenzo-Trueba, Jaime
    Barra-Chicote, Roberto
    Raitio, Tuomo
    Obin, Nicolas
    Alku, Paavo
    Yamagishi, Junichi
    Montero, Juan M.
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1618 - 1621
  • [4] A Data-Driven Affective Analysis Framework Toward Naturally Expressive Speech Synthesis
    Bellegarda, Jerome R.
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (05): : 1113 - 1122
  • [5] Towards Multi-Scale Style Control for Expressive Speech Synthesis
    Li, Xiang
    Song, Changhe
    Li, Jingbei
    Wu, Zhiyong
    Jia, Jia
    Meng, Helen
    INTERSPEECH 2021, 2021, : 4673 - 4677
  • [6] The Automatic Analysis by Synthesis of Speech Prosody with Preliminary Results on Mandarin Chinese
    Hirst, Daniel
    2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : XXIV - XXIV
  • [7] Expressive speech synthesis: A review
    Govind D.
    Prasanna S.R.M.
    International Journal of Speech Technology, 2013, 16 (2) : 237 - 260
  • [8] Speech Variability Compensation for Expressive Speech Synthesis
    Chen, Yan-You
    Kuan, Ta-Wen
    Tsai, Chun-Yu
    Wang, Jhing-Fa
    Chang, Chia-Hao
    1ST INTERNATIONAL CONFERENCE ON ORANGE TECHNOLOGIES (ICOT 2013), 2013, : 210 - 213
  • [9] Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron
    Skerry-Ryan, R. J.
    Battenberg, Eric
    Xiao, Ying
    Wang, Yuxuan
    Stanton, Daisy
    Shor, Joel
    Weiss, Ron J.
    Clark, Rob
    Saurous, Rif A.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [10] Towards Expressive Zero-Shot Speech Synthesis with Hierarchical Prosody Modeling
    Jiang, Yuepeng
    Li, Tao
    Yang, Fengyu
    Xie, Lei
    Menge, Meng
    Wang, Yujun
    INTERSPEECH 2024, 2024, : 2300 - 2304