TWO-STAGE TRAINING METHOD FOR JAPANESE ELECTROLARYNGEAL SPEECH ENHANCEMENT BASED ON SEQUENCE-TO-SEQUENCE VOICE CONVERSION

被引：2

作者：

Ma, Ding ^{[1
]}

Violeta, Lester Phillip ^{[1
]}

Kobayashi, Kazuhiro ^{[1
]}

Toda, Tomoki ^{[1
]}

机构：

[1] Nagoya Univ, Nagoya, Japan

来源：

2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT | 2022年

关键词：

sequence-to-sequence voice conversion; electrolaryngeal speech to normal speech; synthetic parallel data; two-stage training;

D O I：

10.1109/SLT54892.2023.10023033

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Sequence-to-sequence (seq2seq) voice conversion (VC) models have greater potential in converting electrolaryngeal (EL) speech to normal speech (EL2SP) compared to conventional VC models. However, EL2SP based on seq2seq VC requires a sufficiently large amount of parallel data for the model training and it suffers from significant performance degradation when the amount of training data is insufficient. To address this issue, we suggest a novel, two-stage strategy to optimize the performance on EL2SP based on seq2seq VC when a small amount of the parallel dataset is available. In contrast to utilizing high-quality data augmentations in previous studies, we first combine a large amount of imperfect synthetic parallel data of EL and normal speech, with the original dataset into VC training. Then, a second stage training is conducted with the original parallel dataset only. The results show that the proposed method progressively improves the performance of EL2SP based on seq2seq VC.

引用

页码：949 / 954

页数：6

共 50 条

[31] Two-stage ML-based group detection for direct-sequence CDMA systems
Buzzi, S
Lops, M
JOURNAL OF COMMUNICATIONS AND NETWORKS, 2003, 5 (01) : 33 - 42
[32] Two-stage fuzzy logic controller based on adjustable phase sequence for urban traffic intersection
Peng Xiaohong
Xiao Laisheng
Mo Zhi
Liu Guodong
2009 INTERNATIONAL CONFERENCE ON MEASURING TECHNOLOGY AND MECHATRONICS AUTOMATION, VOL II, 2009, : 610 - +
[33] A Two-Stage Beamforming and Diffusion-Based Refiner System for 3D Speech Enhancement
Chen, Feilong
Lin, Wenmo
Sun, Chengli
Guo, Qiaosheng
CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2024, 43 (7) : 4369 - 4389
[34] TransEFVP: A Two-Stage Approach for the Prediction of Human Pathogenic Variants Based on Protein Sequence Embedding Fusion
Yan, Zihao
Ge, Fang
Liu, Yan
Zhang, Yumeng
Li, Fuyi
Song, Jiangning
Yu, Dong-Jun
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2024, 64 (04) : 1407 - 1418
[35] A Novel Predictive Control Method with Optimal Switching Sequence and Filter Resonance Suppression for Two-Stage Matrix Converter
Di, Zhengfei
Xu, Demin
Tarisciotti, Luca
Wheeler, Pat
ENERGIES, 2021, 14 (12)
[36] The management of mental health in a smart medical dialogue system based on a two-stage attention speech enhancement module
Quan, Yongtai
COMPUTER SPEECH AND LANGUAGE, 2025, 92
[37] A self-training algorithm based on the two-stage data editing method with mass-based
Wang, Jikui
Wu, Yiwen
Li, Shaobo
Nie, Feiping
NEURAL NETWORKS, 2023, 168 : 431 - 449
[38] Two-Stage vSLAM Loop Closure Detection Based on Sequence Node Matching and Semi-Semantic Autoencoder
Zhonghua Wang
Zhen Peng
Yong Guan
Lifeng Wu
Journal of Intelligent & Robotic Systems, 2021, 101
[39] Two-Stage vSLAM Loop Closure Detection Based on Sequence Node Matching and Semi-Semantic Autoencoder
Wang, Zhonghua
Peng, Zhen
Guan, Yong
Wu, Lifeng
Journal of Intelligent and Robotic Systems: Theory and Applications, 2021, 101 (02):
[40] Two-Stage vSLAM Loop Closure Detection Based on Sequence Node Matching and Semi-Semantic Autoencoder
Wang, Zhonghua
Peng, Zhen
Guan, Yong
Wu, Lifeng
JOURNAL OF INTELLIGENT & ROBOTIC SYSTEMS, 2021, 101 (02)

← 1 2 3 4 5 →