A framework towards expressive speech analysis and synthesis with preliminary results

被引:1
|
作者
Raptis, Spyros [1 ]
Karabetsos, Sotiris [1 ,2 ]
Chalamandaris, Aimilios [1 ]
Tsiakoulis, Pirros [1 ]
机构
[1] Inst Language & Speech Proc, Athena Res Ctr, Voice & Sound Technol Dept, Athens 15125, Greece
[2] Technol Educ Inst TEI Athens, Dept Elect Engn, Athens 12243, Egaleo, Greece
关键词
Emotion classification; Emotional speech; Expressive speech; Text to speech; Acoustic analysis; Speech synthesis; MODELS;
D O I
10.1007/s12193-015-0186-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Emotion-aware computing presents one of the key challenges in contemporary natural human interaction research in which emotional speech is an essential modality in multimodal user interfaces. Speech modality relates mainly to speech emotion and affect recognition as well as near natural expressive speech synthesis, the latter being considered as one of the next significant milestones in speech synthesis technology. A common problem to recognizing as well as to generating affective and emotional speech content is the adopted methodology on emotion analysis and modeling. This work proposes a generalized framework for annotating, analyzing and modeling expressive speech in a data-driven machine learning approach, towards building expressive text to speech synthesis systems. To this end, the framework as well as the data driven methodology is described, comprised of the techniques and approaches for acoustic analysis and expression clustering. In addition, the deployment of online experimental tools for speech perception and annotation and the description of the utilized speech data together with initial experimental results are also given, depicting the potential of the proposed framework and providing encouraging indications for further research.
引用
收藏
页码:387 / 394
页数:8
相关论文
共 50 条
  • [41] Visualization and Interpretation of Latent Spaces for Controlling Expressive Speech Synthesis through Audio Analysis
    Tits, Noe
    Wang, Fengna
    El Haddad, Kevin
    Pagel, Vincent
    Dutoit, Thierry
    INTERSPEECH 2019, 2019, : 4475 - 4479
  • [42] Towards Building an Emotional Speech Corpus of Algerian Dialect: Criteria and Preliminary Assessment Results
    Ykhlef, Fay
    Derbal, A.
    Benzaba, W.
    Boutaleb, R.
    Bouchaffra, D.
    Meraoubi, H.
    Ykhlef, Far
    2019 INTERNATIONAL CONFERENCE ON ADVANCED ELECTRICAL ENGINEERING (ICAEE), 2019,
  • [43] Speech Modification for Prosody Conversion in Expressive Marathi Text-to-Speech Synthesis
    Anil, Manjare Chandraprabha
    Shirbahadurkar, S. D.
    2014 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN), 2014, : 56 - 58
  • [44] Urdu Speech Corpus and Preliminary Results on Speech Recognition
    Ali, Hazrat
    Ahmad, Nasir
    Hafeez, Abdul
    ENGINEERING APPLICATIONS OF NEURAL NETWORKS, EANN 2016, 2016, 629 : 317 - 325
  • [45] Husky: Towards a More Efficient and Expressive Distributed Computing Framework
    Yang, Fan
    Li, Jinfeng
    Cheng, James
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2016, 9 (05): : 420 - 431
  • [46] Phonetic and functional aspects of speech laughter: towards an expressive cognitive phonology
    Dunbar, George
    COGNITEXTES, 2014, 11
  • [47] Towards a Conceptual Framework for Exploring and Modelling Expressive Musical Gestures
    Rasamimanana, Nicolas
    JOURNAL OF NEW MUSIC RESEARCH, 2012, 41 (01) : 3 - 12
  • [48] Modeling the Acoustic Correlates of Expressive Elements in Text Genres for Expressive Text-to-Speech Synthesis
    Yang, Hongwu
    Meng, Helen M.
    Cai, Lianhong
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1806 - 1809
  • [49] Sentiment Analysis for Expressive Text to Speech Synthesis System Using Different Techniques for Tamil Language
    Sangeetha, J.
    Sudhakar, B.
    Venkatesan, R.
    BIOSCIENCE BIOTECHNOLOGY RESEARCH COMMUNICATIONS, 2019, 12 (02): : 1 - 7
  • [50] What type of inputs will we need for expressive speech synthesis?
    Campbell, N
    PROCEEDINGS OF THE 2002 IEEE WORKSHOP ON SPEECH SYNTHESIS, 2002, : 95 - 98