Wavelet Analysis of Speaker Dependent and Independent Prosody for Voice Conversion

被引：0

作者：

Sisman, Berrak ^{[1
]}

Li, Haizhou ^{[1
]}

机构：

[1] Natl Univ Singapore, Singapore, Singapore

来源：

19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES | 2018年

关键词：

Wavelet transform; prosody analysis; voice conversion;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Thus far, voice conversion studies are mainly focused on the conversion of spectrum. However, speaker identity is also characterized by its prosody features, such as fundamental frequency (F0) and energy contour. We believe that with a better understanding of speaker dependent/independent prosody features, we can devise an analytic approach that addresses voice conversion in a better way. We consider that speaker dependent features reflect speaker's individuality, while speaker independent features reflect the expression of linguistic content. Therefore, the former is to be converted while the latter is to be carried over from source to target during the conversion. To achieve this, we provide an analysis of speaker dependent and speaker independent prosody patterns in different temporal scales by using wavelet transform. The centrepiece of this paper is based on the understanding that a speech utterance can be characterized by speaker dependent and independent features in its prosodic manifestations. Experiments show that the proposed prosody analysis scheme improves the prosody conversion performance consistently under the sparse representation framework.

引用

页码：52 / 56

页数：5

共 50 条

[1] Learning Explicit Prosody Models and Deep Speaker Embeddings for Atypical Voice Conversion
Wang, Disong
Liu, Songxiang
Sun, Lifa
Wu, Xixin
Liu, Xunying
Meng, Helen
INTERSPEECH 2021, 2021, : 4813 - 4817
[2] Speaker Anonymity and Voice Conversion Vulnerability: A Speaker Recognition Analysis
Saini, Shalini
Saxena, Nitesh
2023 IEEE CONFERENCE ON COMMUNICATIONS AND NETWORK SECURITY, CNS, 2023,
[3] Transformation of Prosody in Voice Conversion
Sisman, Berrak
Li, Haizhou
Tan, Kay Chen
2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 1588 - 1597
[4] Comparison Between Speaker Dependent Mode and Speaker Independent Mode for Voice Recognition
Mrvaljevic, Nikola
Sun, Ying
2009 35TH ANNUAL NORTHEAST BIOENGINEERING CONFERENCE, 2009, : 187 - 188
[5] Speaker-Independent Emotional Voice Conversion via Disentangled Representations
Chen, Xunquan
Xu, Xuexin
Chen, Jinhui
Zhang, Zhizhong
Takiguchi, Tetsuya
Hancock, Edwin R.
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 7480 - 7493
[6] VOICE CONVERSION IN TIME-INVARIANT SPEAKER-INDEPENDENT SPACE
Nakashika, Toru
Takiguchi, Tetsuya
Ariki, Yasuo
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[7] Hybrid voice conversion of unit selection and generation using prosody dependent HMM
Okubo, Tadashi
Mochizuki, Ryo
Kobayashi, Tetsunori
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2006, E89D (11) : 2775 - 2782
[8] Voice Conversion Based on Speaker-Dependent Restricted Boltzmann Machines
Nakashika, Toru
Takiguchi, Tetsuya
Ariki, Yasuo
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2014, E97D (06): : 1403 - 1410
[9] A novel method for prosody prediction in voice conversion
Helander, Elina E.
Nurminen, Jani
2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 509 - +
[10] Voice conversion by prosody and vocal tract modification
Rao, K. Sreenivasa
Yegnanarayana, B.
ICIT 2006: 9TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY, PROCEEDINGS, 2006, : 111 - +

← 1 2 3 4 5 →