Word-level Text Markup for Prosody Control in Speech Synthesis

被引:0
|
作者
Korotkova, Yuliya [1 ,2 ]
Kalinovskiy, Ilya [1 ,3 ]
Vakhrusheva, Tatiana [1 ,2 ]
机构
[1] JustAI, St Petersburg, Russia
[2] Higher Sch Econ, Moscow, Russia
[3] Tomsk Polytech Univ, Sch Comp Sci & Robot, Tomsk, Russia
来源
关键词
prosody control; prosody tagging; word-level prosody; speech synthesis; TTS;
D O I
10.21437/Interspeech.2024-715
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Modern Text-to-Speech (TTS) technologies generate speech very close to the natural one, but synthesized voices still lack variation in intonation which, in addition, is hard to control. In this work, we address the problem of prosody control, aiming to capture information about intonation in a markup without hand-labeling and linguistic expertise. We propose a method of encoding prosodic knowledge from textual and acoustic modalities, which are obtained with the help of models pretrained on self-supervised tasks, into latent quantized space with interpretable features. Based on these features, the prosodic markup is constructed, and it is used as an additional input to the TTS model to solve the one-to-many problem and is predicted by text. Moreover, this method allows for prosody control during inference time and scalability to new data and other languages.
引用
收藏
页码:2280 / 2284
页数:5
相关论文
共 50 条
  • [21] Word-level and phrase-level strategies for figurative text identification
    Qimeng Yang
    Long Yu
    Shengwei Tian
    Jinmiao Song
    Multimedia Tools and Applications, 2022, 81 : 14339 - 14353
  • [22] Word-level and phrase-level strategies for figurative text identification
    Yang, Qimeng
    Yu, Long
    Tian, Shengwei
    Song, Jinmiao
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (10) : 14339 - 14353
  • [23] Combating Word-level Adversarial Text with Robust Adversarial Training
    Du, Xiaohu
    Yu, Jie
    Li, Shasha
    Yi, Zibo
    Liu, Hai
    Ma, Jun
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [24] Word-level text highlighting of medical texts for telehealth services
    Ozyegen, Ozan
    Kabe, Devika
    Cevik, Mucahit
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2022, 127
  • [25] Word-level prosodic measures and the differential diagnosis of apraxia of speech
    Haley, Katarina L.
    Jacks, Adam
    CLINICAL LINGUISTICS & PHONETICS, 2019, 33 (05) : 479 - 495
  • [26] INTERACTIVE MULTI-LEVEL PROSODY CONTROL FOR EXPRESSIVE SPEECH SYNTHESIS
    Cornille, Tobias
    Wang, Fengna
    Bekker, Jessa
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8312 - 8316
  • [27] Word-Level and Pinyin-Level Based Chinese Short Text Classification
    Sun, Xinjie
    Huo, Xingying
    IEEE ACCESS, 2022, 10 : 125552 - 125563
  • [28] A CHINESE CHARACTER-LEVEL AND WORD-LEVEL COMPLEMENTARY TEXT CLASSIFICATION METHOD
    Chen, Wentong
    Fan, Chunxiao
    Wu, Yuexin
    Lou, Zhixiong
    2020 25TH INTERNATIONAL CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE (TAAI 2020), 2020, : 187 - 192
  • [29] Document and Word-level Language Identification for Noisy User Generated Text
    Kozhirbayev, Zhanibek
    Yessenbayev, Zhandos
    Makazhanov, Aibek
    2018 IEEE 12TH INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES (AICT), 2018, : 124 - 127
  • [30] TC-DWA: Text Clustering with Dual Word-Level Augmentation
    Cheng, Bo
    Li, Ximing
    Chang, Yi
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 6, 2023, : 7113 - 7121