Wavelet Analysis of Speaker Dependent and Independent Prosody for Voice Conversion

被引:0
|
作者
Sisman, Berrak [1 ]
Li, Haizhou [1 ]
机构
[1] Natl Univ Singapore, Singapore, Singapore
关键词
Wavelet transform; prosody analysis; voice conversion;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Thus far, voice conversion studies are mainly focused on the conversion of spectrum. However, speaker identity is also characterized by its prosody features, such as fundamental frequency (F0) and energy contour. We believe that with a better understanding of speaker dependent/independent prosody features, we can devise an analytic approach that addresses voice conversion in a better way. We consider that speaker dependent features reflect speaker's individuality, while speaker independent features reflect the expression of linguistic content. Therefore, the former is to be converted while the latter is to be carried over from source to target during the conversion. To achieve this, we provide an analysis of speaker dependent and speaker independent prosody patterns in different temporal scales by using wavelet transform. The centrepiece of this paper is based on the understanding that a speech utterance can be characterized by speaker dependent and independent features in its prosodic manifestations. Experiments show that the proposed prosody analysis scheme improves the prosody conversion performance consistently under the sparse representation framework.
引用
收藏
页码:52 / 56
页数:5
相关论文
共 50 条
  • [31] Voice conversion based on static speaker characteristics
    Schwardt, LC
    du Preez, JA
    PROCEEDINGS OF THE 1998 SOUTH AFRICAN SYMPOSIUM ON COMMUNICATIONS AND SIGNAL PROCESSING: COMSIG '98, 1998, : 57 - 62
  • [32] Prosody modeling and Eigen-Prosody analysis for robust speaker recognition
    Chen, ZH
    Liao, YF
    Juang, YT
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 185 - 188
  • [33] Voice, Articulation, and Prosody Contribute to Listener Perceptions of Speaker Gender: A Systematic Review and Meta-Analysis
    Leung, Yeptain
    Oates, Jennifer
    Chan, Siew Pang
    JOURNAL OF SPEECH LANGUAGE AND HEARING RESEARCH, 2018, 61 (02): : 266 - 297
  • [34] Latent prosody analysis for robust speaker identification
    Liao, Yuan-Fu
    Chen, Zi-He
    Juang, Yau-Tarng
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (06): : 1870 - 1883
  • [35] COMPARISON OF SPEAKER DEPENDENT AND SPEAKER INDEPENDENT EMOTION RECOGNITION
    Rybka, Jan
    Janicki, Artur
    INTERNATIONAL JOURNAL OF APPLIED MATHEMATICS AND COMPUTER SCIENCE, 2013, 23 (04) : 797 - 808
  • [36] ON PROSODY MODELING FOR ASR plus TTS BASED VOICE CONVERSION
    Huang, Wen-Chin
    Hayashi, Tomoki
    Li, Xinjian
    Watanabe, Shinji
    Toda, Tomoki
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 642 - 649
  • [37] Speaker-independent HMM-based voice conversion using adaptive quantization of the fundamental frequency
    Nose, Takashi
    Kobayashi, Takao
    SPEECH COMMUNICATION, 2011, 53 (07) : 973 - 985
  • [38] Voice Conversion Based on Improved GMM and Spectrum with Synchronous Prosody
    Zhang Bing
    Yu Yibiao
    ICSP: 2008 9TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, VOLS 1-5, PROCEEDINGS, 2008, : 659 - 662
  • [39] Towards Fine-Grained Prosody Control for Voice Conversion
    Lian, Zheng
    Zhong, Rongxiu
    Wen, Zhengqi
    Liu, Bin
    Tao, Jianhua
    2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
  • [40] Jointly Trained Conversion Model With LPCNet for Any-to-One Voice Conversion Using Speaker-Independent Linguistic Features
    Himawan, Ivan
    Wang, Ruizhe
    Sridharan, Sridha
    Fookes, Clinton
    IEEE ACCESS, 2022, 10 : 134029 - 134037