TWO-STAGE TRAINING METHOD FOR JAPANESE ELECTROLARYNGEAL SPEECH ENHANCEMENT BASED ON SEQUENCE-TO-SEQUENCE VOICE CONVERSION

被引:2
|
作者
Ma, Ding [1 ]
Violeta, Lester Phillip [1 ]
Kobayashi, Kazuhiro [1 ]
Toda, Tomoki [1 ]
机构
[1] Nagoya Univ, Nagoya, Japan
关键词
sequence-to-sequence voice conversion; electrolaryngeal speech to normal speech; synthetic parallel data; two-stage training;
D O I
10.1109/SLT54892.2023.10023033
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sequence-to-sequence (seq2seq) voice conversion (VC) models have greater potential in converting electrolaryngeal (EL) speech to normal speech (EL2SP) compared to conventional VC models. However, EL2SP based on seq2seq VC requires a sufficiently large amount of parallel data for the model training and it suffers from significant performance degradation when the amount of training data is insufficient. To address this issue, we suggest a novel, two-stage strategy to optimize the performance on EL2SP based on seq2seq VC when a small amount of the parallel dataset is available. In contrast to utilizing high-quality data augmentations in previous studies, we first combine a large amount of imperfect synthetic parallel data of EL and normal speech, with the original dataset into VC training. Then, a second stage training is conducted with the original parallel dataset only. The results show that the proposed method progressively improves the performance of EL2SP based on seq2seq VC.
引用
收藏
页码:949 / 954
页数:6
相关论文
共 50 条
  • [31] Two-stage ML-based group detection for direct-sequence CDMA systems
    Buzzi, S
    Lops, M
    JOURNAL OF COMMUNICATIONS AND NETWORKS, 2003, 5 (01) : 33 - 42
  • [32] Two-stage fuzzy logic controller based on adjustable phase sequence for urban traffic intersection
    Peng Xiaohong
    Xiao Laisheng
    Mo Zhi
    Liu Guodong
    2009 INTERNATIONAL CONFERENCE ON MEASURING TECHNOLOGY AND MECHATRONICS AUTOMATION, VOL II, 2009, : 610 - +
  • [33] A Two-Stage Beamforming and Diffusion-Based Refiner System for 3D Speech Enhancement
    Chen, Feilong
    Lin, Wenmo
    Sun, Chengli
    Guo, Qiaosheng
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2024, 43 (7) : 4369 - 4389
  • [34] TransEFVP: A Two-Stage Approach for the Prediction of Human Pathogenic Variants Based on Protein Sequence Embedding Fusion
    Yan, Zihao
    Ge, Fang
    Liu, Yan
    Zhang, Yumeng
    Li, Fuyi
    Song, Jiangning
    Yu, Dong-Jun
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2024, 64 (04) : 1407 - 1418
  • [35] A Novel Predictive Control Method with Optimal Switching Sequence and Filter Resonance Suppression for Two-Stage Matrix Converter
    Di, Zhengfei
    Xu, Demin
    Tarisciotti, Luca
    Wheeler, Pat
    ENERGIES, 2021, 14 (12)
  • [36] The management of mental health in a smart medical dialogue system based on a two-stage attention speech enhancement module
    Quan, Yongtai
    COMPUTER SPEECH AND LANGUAGE, 2025, 92
  • [37] A self-training algorithm based on the two-stage data editing method with mass-based
    Wang, Jikui
    Wu, Yiwen
    Li, Shaobo
    Nie, Feiping
    NEURAL NETWORKS, 2023, 168 : 431 - 449
  • [38] Two-Stage vSLAM Loop Closure Detection Based on Sequence Node Matching and Semi-Semantic Autoencoder
    Zhonghua Wang
    Zhen Peng
    Yong Guan
    Lifeng Wu
    Journal of Intelligent & Robotic Systems, 2021, 101
  • [39] Two-Stage vSLAM Loop Closure Detection Based on Sequence Node Matching and Semi-Semantic Autoencoder
    Wang, Zhonghua
    Peng, Zhen
    Guan, Yong
    Wu, Lifeng
    Journal of Intelligent and Robotic Systems: Theory and Applications, 2021, 101 (02):
  • [40] Two-Stage vSLAM Loop Closure Detection Based on Sequence Node Matching and Semi-Semantic Autoencoder
    Wang, Zhonghua
    Peng, Zhen
    Guan, Yong
    Wu, Lifeng
    JOURNAL OF INTELLIGENT & ROBOTIC SYSTEMS, 2021, 101 (02)