TWO-STAGE TRAINING METHOD FOR JAPANESE ELECTROLARYNGEAL SPEECH ENHANCEMENT BASED ON SEQUENCE-TO-SEQUENCE VOICE CONVERSION

被引:2
|
作者
Ma, Ding [1 ]
Violeta, Lester Phillip [1 ]
Kobayashi, Kazuhiro [1 ]
Toda, Tomoki [1 ]
机构
[1] Nagoya Univ, Nagoya, Japan
关键词
sequence-to-sequence voice conversion; electrolaryngeal speech to normal speech; synthetic parallel data; two-stage training;
D O I
10.1109/SLT54892.2023.10023033
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sequence-to-sequence (seq2seq) voice conversion (VC) models have greater potential in converting electrolaryngeal (EL) speech to normal speech (EL2SP) compared to conventional VC models. However, EL2SP based on seq2seq VC requires a sufficiently large amount of parallel data for the model training and it suffers from significant performance degradation when the amount of training data is insufficient. To address this issue, we suggest a novel, two-stage strategy to optimize the performance on EL2SP based on seq2seq VC when a small amount of the parallel dataset is available. In contrast to utilizing high-quality data augmentations in previous studies, we first combine a large amount of imperfect synthetic parallel data of EL and normal speech, with the original dataset into VC training. Then, a second stage training is conducted with the original parallel dataset only. The results show that the proposed method progressively improves the performance of EL2SP based on seq2seq VC.
引用
收藏
页码:949 / 954
页数:6
相关论文
共 50 条
  • [21] A Two-Stage LLVM Option Sequence Optimization Method to Minimize Energy Consumption
    Ni, Youcong
    Du, Xin
    Song, Liyan
    Xiao, Ruliang
    Ye, Peng
    Wang, Jianwen
    SWARM AND EVOLUTIONARY COMPUTATION, 2024, 88
  • [22] Two-stage and Self-supervised Voice Conversion for Zero-Shot Dysarthric Speech Reconstruction
    Liu, Dong
    Lin, Yueqian
    Bu, Hui
    Li, Ming
    2024 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, IALP 2024, 2024, : 423 - 427
  • [23] A two-stage methodology for sequence classification based on sequential pattern mining and optimization
    Exarchos, Themis P.
    Tsipouras, Markos G.
    Papaloukas, Costas
    Fotiadis, Dimitrios I.
    DATA & KNOWLEDGE ENGINEERING, 2008, 66 (03) : 467 - 487
  • [24] On Joint Sequence Design for Feedback-based Two-stage Switch Architecture
    Hu, Bing
    Yeung, Kwan L.
    2008 INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE SWITCHING AND ROUTING (HPSR), 2008, : 242 - 247
  • [25] The optimal joint sequence design in the feedback-based two-stage switch
    Huang, An
    Hu, Bing
    JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2014, 45 : 27 - 34
  • [26] The Optimal Joint Sequence Design in the Feedback-based Two-stage Switch
    Huang, An
    Hu, Bing
    2014 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2014, : 3031 - 3036
  • [27] TSTNN: TWO-STAGE TRANSFORMER BASED NEURAL NETWORK FOR SPEECH ENHANCEMENT IN THE TIME DOMAIN
    Wan, Kai
    He, Bengbeng
    Zh, Wei-Ping
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7098 - 7102
  • [28] Two-Stage Population Based Training Method for Deep Reinforcement Learning
    Zhou, Yinda
    Liu, Weiming
    Li, Bin
    2019 THE 3RD INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPILATION, COMPUTING AND COMMUNICATIONS (HP3C 2019), 2019, : 38 - 44
  • [29] Speech Enhancement Based on Two-Stage Processing with Deep Neural Network for Laser Doppler Vibrometer
    Cai, Chengkai
    Iwai, Kenta
    Nishiura, Takanobu
    APPLIED SCIENCES-BASEL, 2023, 13 (03):
  • [30] A Two-Stage Neural Network for Sleep Stage Classification Based on Feature Learning, Sequence Learning and Data Augmentation
    Sun, Chenglu
    Fan, Naha
    Chen, Chen
    Li, Wei
    Chen, Wei
    IEEE ACCESS, 2019, 7 : 109386 - 109397