A framework towards expressive speech analysis and synthesis with preliminary results

被引:1
|
作者
Raptis, Spyros [1 ]
Karabetsos, Sotiris [1 ,2 ]
Chalamandaris, Aimilios [1 ]
Tsiakoulis, Pirros [1 ]
机构
[1] Inst Language & Speech Proc, Athena Res Ctr, Voice & Sound Technol Dept, Athens 15125, Greece
[2] Technol Educ Inst TEI Athens, Dept Elect Engn, Athens 12243, Egaleo, Greece
关键词
Emotion classification; Emotional speech; Expressive speech; Text to speech; Acoustic analysis; Speech synthesis; MODELS;
D O I
10.1007/s12193-015-0186-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Emotion-aware computing presents one of the key challenges in contemporary natural human interaction research in which emotional speech is an essential modality in multimodal user interfaces. Speech modality relates mainly to speech emotion and affect recognition as well as near natural expressive speech synthesis, the latter being considered as one of the next significant milestones in speech synthesis technology. A common problem to recognizing as well as to generating affective and emotional speech content is the adopted methodology on emotion analysis and modeling. This work proposes a generalized framework for annotating, analyzing and modeling expressive speech in a data-driven machine learning approach, towards building expressive text to speech synthesis systems. To this end, the framework as well as the data driven methodology is described, comprised of the techniques and approaches for acoustic analysis and expression clustering. In addition, the deployment of online experimental tools for speech perception and annotation and the description of the utilized speech data together with initial experimental results are also given, depicting the potential of the proposed framework and providing encouraging indications for further research.
引用
收藏
页码:387 / 394
页数:8
相关论文
共 50 条
  • [31] Generating emphatic speech with hidden Markov model for expressive speech synthesis
    Wu, Zhiyong
    Ning, Yishuang
    Zang, Xiao
    Jia, Jia
    Meng, Fanbo
    Meng, Helen
    Cai, Lianhong
    MULTIMEDIA TOOLS AND APPLICATIONS, 2015, 74 (22) : 9909 - 9925
  • [32] Expressive Speech Synthesis Using Emotion-Specific Speech Inventories
    Zainko, Csaba
    Fek, Mark
    Nemeth, Geza
    VERBAL AND NONVERBAL FEATURES OF HUMAN-HUMAN AND HUMAN-MACHINE INTERACTIONS, 2008, 5042 : 225 - 234
  • [33] Generating emphatic speech with hidden Markov model for expressive speech synthesis
    Zhiyong Wu
    Yishuang Ning
    Xiao Zang
    Jia Jia
    Fanbo Meng
    Helen Meng
    Lianhong Cai
    Multimedia Tools and Applications, 2015, 74 : 9909 - 9925
  • [34] Expressive Speech Analysis for Story Telling Application
    Patil, Prerna R.
    Manjare, Chandraprabha A.
    2014 IEEE GLOBAL CONFERENCE ON WIRELESS COMPUTING AND NETWORKING (GCWCN), 2014, : 97 - 101
  • [35] Expressive Prosody for Unit-selection Speech Synthesis
    Strom, Volker
    Clark, Robert
    King, Simon
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1296 - 1299
  • [36] Intonation and Prosody Conversion for Expressive Mandarin Speech Synthesis
    Zhu, Jing
    Yu, Yibiao
    PROCEEDINGS OF 2012 IEEE 11TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP) VOLS 1-3, 2012, : 549 - 552
  • [37] Analysis by Synthesis: Using an expressive TTS Model as Feature Extractor for Paralinguistic Speech Classification
    Schiller, Dominik
    Mertes, Silvan
    van Rijn, Pol
    Andre, Elisabeth
    INTERSPEECH 2021, 2021, : 486 - 490
  • [38] JOINT AND ADVERSARIAL TRAINING WITH ASR FOR EXPRESSIVE SPEECH SYNTHESIS
    Zhang, Kaili
    Gong, Cheng
    Lu, Wenhuan
    Wang, Longbiao
    Wei, Jianguo
    Liu, Dawei
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6322 - 6326
  • [39] SynPaFlex-Corpus: An Expressive French Audiobooks Corpus Dedicated to Expressive Speech Synthesis
    Sini, Aghilas
    Lolive, Damien
    Vidal, Gaelle
    Tahon, Marie
    Delais-Roussarie, Elisabeth
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 4289 - 4296
  • [40] Expressive Speech Synthesis: Past, Present, and Possible Futures
    Schroeder, Marc
    AFFECTIVE INFORMATION PROCESSING, 2009, : 111 - 126