Comparison of Multi-Scale Speaker Vectors and S-Vectors for Zero-Shot Speech Synthesis

被引：0

作者：

Cory, Tristin ^{[1
]}

Iqbal, Razib ^{[1
]}

机构：

[1] Missouri State Univ, Dept Comp Sci, Springfield, MO 65897 USA

来源：

2022 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM) | 2022年

关键词：

speaker adaptation; speaker embedding; speaker encoder; text to speech;

D O I：

10.1109/ISM55400.2022.00055

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We compare a novel speaker encoder model, called Multi-Scale Speaker (MSS) Vectors, with state-of-the-art s-vectors model for zero-shot speech synthesis. The s-vectors model relies on a modified transformer self-attention network for its architecture. The MSS vectors model introduces a multi-scale approach to the s-vectors model. Results demonstrate that our model produces more natural and similar-sounding synthesized speech for unseen speakers in a zero-shot speech synthesis system.

引用

页码：247 / 248

页数：2

共 50 条

[1] Multi-Scale Speaker Vectors for Zero-Shot Speech Synthesis
Cory, Tristin
Iqbal, Razib
2022 IEEE 46TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE (COMPSAC 2022), 2022, : 496 - 501
[2] Normalization Driven Zero-shot Multi-Speaker Speech Synthesis
Kumar, Neeraj
Goel, Srishti
Narang, Ankur
Lall, Brejesh
INTERSPEECH 2021, 2021, : 1354 - 1358
[3] Zero-Shot Normalization Driven Multi-Speaker Text to Speech Synthesis
Kumar, Neeraj
Narang, Ankur
Lall, Brejesh
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 1679 - 1693
[4] Towards Zero-Shot Multi-Speaker Multi-Accent Text-to-Speech Synthesis
Zhang, Mingyang
Zhou, Xuehao
Wu, Zhizheng
Li, Haizhou
IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 947 - 951
[5] Generalizable Zero-Shot Speaker Adaptive Speech Synthesis with Disentangled Representations
Wang, Wenbin
Song, Yang
Jha, Sanjay
INTERSPEECH 2023, 2023, : 4454 - 4458
[6] S-Vectors and TESA: Speaker Embeddings and a Speaker Authenticator Based on Transformer Encoder
Mary, Narla John Metilda Sagaya
Umesh, Srinivasan
Katta, Sandesh Varadaraju
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 404 - 413
[7] Hierarchical Timbre-Cadence Speaker Encoder for Zero-shot Speech Synthesis
Lee, Joun Yeop
Bae, Jae-Sung
Mun, Seongkyu
Lee, Jihwan
Lee, Ji-Hyun
Cho, Hoon-Young
Kim, Chanwoo
INTERSPEECH 2023, 2023, : 4334 - 4338
[8] Zero-Shot Object Recognition Using Semantic Label Vectors
Naha, Shujon
Wang, Yang
2015 12TH CONFERENCE ON COMPUTER AND ROBOT VISION CRV 2015, 2015, : 94 - 100
[9] Zero-shot Object Detection Based on Dynamic Semantic Vectors
Li, Haoyu
Mei, Jilin
Zhou, Jiancong
Hu, Yu
2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2023), 2023, : 9267 - 9273
[10] Multi-scale visual attention for attribute disambiguation in zero-shot learning
Tian, Long
Chen, Bo
Ren, Jie
Zhang, Hao
Wu, Zhenhua
Han, Ning
Chen, Yuanwei
Liu, Hongwei
SIGNAL PROCESSING-IMAGE COMMUNICATION, 2022, 103

← 1 2 3 4 5 →