Comparison of Multi-Scale Speaker Vectors and S-Vectors for Zero-Shot Speech Synthesis

被引:0
|
作者
Cory, Tristin [1 ]
Iqbal, Razib [1 ]
机构
[1] Missouri State Univ, Dept Comp Sci, Springfield, MO 65897 USA
来源
2022 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM) | 2022年
关键词
speaker adaptation; speaker embedding; speaker encoder; text to speech;
D O I
10.1109/ISM55400.2022.00055
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We compare a novel speaker encoder model, called Multi-Scale Speaker (MSS) Vectors, with state-of-the-art s-vectors model for zero-shot speech synthesis. The s-vectors model relies on a modified transformer self-attention network for its architecture. The MSS vectors model introduces a multi-scale approach to the s-vectors model. Results demonstrate that our model produces more natural and similar-sounding synthesized speech for unseen speakers in a zero-shot speech synthesis system.
引用
收藏
页码:247 / 248
页数:2
相关论文
共 50 条
  • [1] Multi-Scale Speaker Vectors for Zero-Shot Speech Synthesis
    Cory, Tristin
    Iqbal, Razib
    2022 IEEE 46TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE (COMPSAC 2022), 2022, : 496 - 501
  • [2] Normalization Driven Zero-shot Multi-Speaker Speech Synthesis
    Kumar, Neeraj
    Goel, Srishti
    Narang, Ankur
    Lall, Brejesh
    INTERSPEECH 2021, 2021, : 1354 - 1358
  • [3] Zero-Shot Normalization Driven Multi-Speaker Text to Speech Synthesis
    Kumar, Neeraj
    Narang, Ankur
    Lall, Brejesh
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 1679 - 1693
  • [4] Towards Zero-Shot Multi-Speaker Multi-Accent Text-to-Speech Synthesis
    Zhang, Mingyang
    Zhou, Xuehao
    Wu, Zhizheng
    Li, Haizhou
    IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 947 - 951
  • [5] Generalizable Zero-Shot Speaker Adaptive Speech Synthesis with Disentangled Representations
    Wang, Wenbin
    Song, Yang
    Jha, Sanjay
    INTERSPEECH 2023, 2023, : 4454 - 4458
  • [6] S-Vectors and TESA: Speaker Embeddings and a Speaker Authenticator Based on Transformer Encoder
    Mary, Narla John Metilda Sagaya
    Umesh, Srinivasan
    Katta, Sandesh Varadaraju
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 404 - 413
  • [7] Hierarchical Timbre-Cadence Speaker Encoder for Zero-shot Speech Synthesis
    Lee, Joun Yeop
    Bae, Jae-Sung
    Mun, Seongkyu
    Lee, Jihwan
    Lee, Ji-Hyun
    Cho, Hoon-Young
    Kim, Chanwoo
    INTERSPEECH 2023, 2023, : 4334 - 4338
  • [8] Zero-Shot Object Recognition Using Semantic Label Vectors
    Naha, Shujon
    Wang, Yang
    2015 12TH CONFERENCE ON COMPUTER AND ROBOT VISION CRV 2015, 2015, : 94 - 100
  • [9] Zero-shot Object Detection Based on Dynamic Semantic Vectors
    Li, Haoyu
    Mei, Jilin
    Zhou, Jiancong
    Hu, Yu
    2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2023), 2023, : 9267 - 9273
  • [10] Multi-scale visual attention for attribute disambiguation in zero-shot learning
    Tian, Long
    Chen, Bo
    Ren, Jie
    Zhang, Hao
    Wu, Zhenhua
    Han, Ning
    Chen, Yuanwei
    Liu, Hongwei
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2022, 103