Hearer model based stress prediction for Chinese TTS system

被引:0
|
作者
Hu, GP [1 ]
Liu, QF [1 ]
Hu, Y [1 ]
Wang, RH [1 ]
机构
[1] Univ Sci & Technol China, Ifly Speech Lab, Hefei 230026, Peoples R China
关键词
hearer model; stress prediction; speech synthesis;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
People often feel tired if he/she listens synthesized speech for a long time. This is mainly because synthesized speech is too flat and never stresses the focus. Different to traditional TTS research approach of simulating speaker, this paper does the stress prediction research from the point of the hearer. An ideal hearer model is first proposed to predict the stress distribution based on the hypothesis: people speak with limited stress effort and distribute the limited effort to ensure that the hearer can understand the speaker easily. Then according to the limited research resource, this paper modifies the ideal hearer model and presents a practical model. Experiments show that the stress prediction achieves an acceptable rate of 87.36%.
引用
收藏
页码:161 / 164
页数:4
相关论文
共 50 条
  • [21] Implementation of TTS system based on Windows Vista
    Xiao Zhi
    Yu Fengqin
    Li Yu
    PROCEEDINGS OF THE 26TH CHINESE CONTROL CONFERENCE, VOL 6, 2007, : 400 - +
  • [22] Bertsokantari: a TTS based singing synthesis system
    del Blanco, Eder
    Hernaez, Inma
    Navas, Eva
    Sarasola, Xabier
    Erro, Daniel
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1240 - 1244
  • [23] A Macro-Prosodic Indications for a Romanian TtS system based on the Functional Intonational Model
    Apopei, Vasile
    Jitca, Doina
    Paduraru, Otilia
    2012 THIRD INTERNATIONAL CONFERENCE ON EMERGING SECURITY TECHNOLOGIES (EST), 2012, : 186 - 190
  • [24] GSR Based Generic Stress Prediction System
    Jaiswal, Dibyanshu
    Chatterjee, Debatri
    Mithun, B. S.
    Ramakrishnan, Ramesh Kumar
    Pal, Arpan
    ADJUNCT PROCEEDINGS OF THE 2023 ACM INTERNATIONAL JOINT CONFERENCE ON PERVASIVE AND UBIQUITOUS COMPUTING & THE 2023 ACM INTERNATIONAL SYMPOSIUM ON WEARABLE COMPUTING, UBICOMP/ISWC 2023 ADJUNCT, 2023, : 433 - 438
  • [25] Risk Prediction Model of Gestational Diabetes Mellitus in a Chinese Population Based on a Risk Scoring System
    Wang, Yanmei
    Ge, Zhijuan
    Chen, Lei
    Hu, Jun
    Zhou, Wenting
    Shen, Shanmei
    Zhu, Dalong
    Bi, Yan
    DIABETES THERAPY, 2021, 12 (06) : 1721 - 1734
  • [26] Risk Prediction Model of Gestational Diabetes Mellitus in a Chinese Population Based on a Risk Scoring System
    Yanmei Wang
    Zhijuan Ge
    Lei Chen
    Jun Hu
    Wenting Zhou
    Shanmei Shen
    Dalong Zhu
    Yan Bi
    Diabetes Therapy, 2021, 12 : 1721 - 1734
  • [27] An Intonation Prediction Module for Romanian TTS System, as a Prosodic Tree Generator
    Jitca, Doina
    Apopei, Vasile
    2011 6TH CONFERENCE ON SPEECH TECHNOLOGY AND HUMAN-COMPUTER DIALOGUE (SPED), 2011,
  • [28] An Investigation of Phrase Break Prediction in an End-to-End TTS System
    Anandaswarup Vadapalli
    SN Computer Science, 6 (2)
  • [29] An RNN-based algorithm to detect prosodic phrase for Chinese TTS
    Ying, ZW
    Shi, XH
    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 809 - 812
  • [30] A new pitch generation model based on internal dependence of pitch contour for Manadrin TTS system
    Yu, Jian
    Zhang, Wanzhi
    Tao, Jianhua
    2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13, 2006, : 741 - 744