Wavelet Analysis of Speaker Dependent and Independent Prosody for Voice Conversion

被引:0
|
作者
Sisman, Berrak [1 ]
Li, Haizhou [1 ]
机构
[1] Natl Univ Singapore, Singapore, Singapore
关键词
Wavelet transform; prosody analysis; voice conversion;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Thus far, voice conversion studies are mainly focused on the conversion of spectrum. However, speaker identity is also characterized by its prosody features, such as fundamental frequency (F0) and energy contour. We believe that with a better understanding of speaker dependent/independent prosody features, we can devise an analytic approach that addresses voice conversion in a better way. We consider that speaker dependent features reflect speaker's individuality, while speaker independent features reflect the expression of linguistic content. Therefore, the former is to be converted while the latter is to be carried over from source to target during the conversion. To achieve this, we provide an analysis of speaker dependent and speaker independent prosody patterns in different temporal scales by using wavelet transform. The centrepiece of this paper is based on the understanding that a speech utterance can be characterized by speaker dependent and independent features in its prosodic manifestations. Experiments show that the proposed prosody analysis scheme improves the prosody conversion performance consistently under the sparse representation framework.
引用
收藏
页码:52 / 56
页数:5
相关论文
共 50 条
  • [1] Learning Explicit Prosody Models and Deep Speaker Embeddings for Atypical Voice Conversion
    Wang, Disong
    Liu, Songxiang
    Sun, Lifa
    Wu, Xixin
    Liu, Xunying
    Meng, Helen
    INTERSPEECH 2021, 2021, : 4813 - 4817
  • [2] Speaker Anonymity and Voice Conversion Vulnerability: A Speaker Recognition Analysis
    Saini, Shalini
    Saxena, Nitesh
    2023 IEEE CONFERENCE ON COMMUNICATIONS AND NETWORK SECURITY, CNS, 2023,
  • [3] Transformation of Prosody in Voice Conversion
    Sisman, Berrak
    Li, Haizhou
    Tan, Kay Chen
    2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 1588 - 1597
  • [4] Comparison Between Speaker Dependent Mode and Speaker Independent Mode for Voice Recognition
    Mrvaljevic, Nikola
    Sun, Ying
    2009 35TH ANNUAL NORTHEAST BIOENGINEERING CONFERENCE, 2009, : 187 - 188
  • [5] Speaker-Independent Emotional Voice Conversion via Disentangled Representations
    Chen, Xunquan
    Xu, Xuexin
    Chen, Jinhui
    Zhang, Zhizhong
    Takiguchi, Tetsuya
    Hancock, Edwin R.
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 7480 - 7493
  • [6] VOICE CONVERSION IN TIME-INVARIANT SPEAKER-INDEPENDENT SPACE
    Nakashika, Toru
    Takiguchi, Tetsuya
    Ariki, Yasuo
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [7] Hybrid voice conversion of unit selection and generation using prosody dependent HMM
    Okubo, Tadashi
    Mochizuki, Ryo
    Kobayashi, Tetsunori
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2006, E89D (11) : 2775 - 2782
  • [8] Voice Conversion Based on Speaker-Dependent Restricted Boltzmann Machines
    Nakashika, Toru
    Takiguchi, Tetsuya
    Ariki, Yasuo
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2014, E97D (06): : 1403 - 1410
  • [9] A novel method for prosody prediction in voice conversion
    Helander, Elina E.
    Nurminen, Jani
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 509 - +
  • [10] Voice conversion by prosody and vocal tract modification
    Rao, K. Sreenivasa
    Yegnanarayana, B.
    ICIT 2006: 9TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY, PROCEEDINGS, 2006, : 111 - +