Multimodal Unsupervised Speech Translation for Recognizing and Evaluating Second Language Speech

被引:5
|
作者
Lee, Yun Kyung [1 ]
Park, Jeon Gue [1 ]
机构
[1] Elect & Telecommun Res Inst ETRI, Artificial Intelligence Res Lab, Daejeon 34129, South Korea
来源
APPLIED SCIENCES-BASEL | 2021年 / 11卷 / 06期
关键词
fluency evaluation; speech recognition; data augmentation; variational autoencoder; speech conversion; NONPARALLEL VOICE CONVERSION; BLIND SEPARATION; RECOGNITION;
D O I
10.3390/app11062642
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
This paper addresses an automatic proficiency evaluation and speech recognition for second language (L2) speech. The proposed method recognizes the speech uttered by the L2 speaker, measures a variety of fluency scores, and evaluates the proficiency of the speaker's spoken English. Stress and rhythm scores are one of the important factors used to evaluate fluency in spoken English and are computed by comparing the stress patterns and the rhythm distributions to those of native speakers. In order to compute the stress and rhythm scores even when the phonemic sequence of the L2 speaker's English sentence is different from the native speaker's one, we align the phonemic sequences based on a dynamic time-warping approach. We also improve the performance of the speech recognition system for non-native speakers and compute fluency features more accurately by augmenting the non-native training dataset and training an acoustic model with the augmented dataset. In this work, we augment the non-native speech by converting some speech signal characteristics (style) while preserving its linguistic information. The proposed variational autoencoder (VAE)-based speech conversion network trains the conversion model by decomposing the spectral features of the speech into a speaker-invariant content factor and a speaker-specific style factor to estimate diverse and robust speech styles. Experimental results show that the proposed method effectively measures the fluency scores and generates diverse output signals. Also, in the proficiency evaluation and speech recognition tests, the proposed method improves the proficiency score performance and speech recognition accuracy for all proficiency areas compared to a method employing conventional acoustic models.
引用
收藏
页数:16
相关论文
共 50 条
  • [31] Age of learning and second language speech
    Flege, JE
    SECOND LANGUAGE ACQUISITION AND THE CRITICAL PERIOD HYPOTHESIS, 1999, : 101 - 131
  • [32] AwezaMed: A Multilingual, Multimodal Speech-To-Speech Translation Application for Maternal Health Care
    Marais, Laurette
    Louw, Johannes A.
    Badenhorst, Jaco
    Calteaux, Karen
    Wilken, Ilana
    van Niekerk, Nina
    Stein, Glenn
    PROCEEDINGS OF 2020 23RD INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION 2020), 2020, : 669 - 676
  • [33] Evaluating context-invariance in unsupervised speech representations
    Hallap, Mark
    Dupoux, Emmanuel
    Dunbar, Ewan
    INTERSPEECH 2023, 2023, : 2973 - 2977
  • [34] Structured GMM Based on Unsupervised Clustering for Recognizing Adult and Child Speech
    Gorin, Arseniy
    Jouvet, Denis
    STATISTICAL LANGUAGE AND SPEECH PROCESSING, SLSP 2014, 2014, 8791 : 108 - 119
  • [35] Leveraging unsupervised and weakly-supervised data to improve direct speech-to-speech translation
    Jia, Ye
    Ding, Yifan
    Bapna, Ankur
    Cherry, Colin
    Zhang, Yu
    Conneau, Alexis
    Morioka, Nobuyuki
    INTERSPEECH 2022, 2022, : 1721 - 1725
  • [36] Evaluation of Alternatives on Speech to Sign Language Translation
    San-Segundo, R.
    Perez, A.
    Ortiz, D.
    D'Haro, L. F.
    Torres, M. I.
    Casacuberta, F.
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 53 - +
  • [37] Speech to sign language translation system for Spanish
    San-Segundo, R.
    Barra, R.
    Cordoba, R.
    D'Haro, L. F.
    Fernandez, F.
    Ferreiros, J.
    Lucas, J. M.
    Macias-Guarasa, J.
    Montero, J. M.
    Pardo, J. M.
    SPEECH COMMUNICATION, 2008, 50 (11-12) : 1009 - 1020
  • [38] Towards language portability in statistical speech translation
    Waibel, A
    Schultz, T
    Vogel, S
    Fügen, C
    Honal, M
    Kolss, M
    Reichert, J
    Stüker, S
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL III, PROCEEDINGS: IMAGE AND MULTIDIMENSIONAL SIGNAL PROCESSING SPECIAL SESSIONS, 2004, : 765 - 768
  • [39] Language awareness and perception of connected speech in a second language
    Kennedy, Sara
    Blanchet, Josee
    LANGUAGE AWARENESS, 2014, 23 (1-2) : 91 - 105
  • [40] Cross-Language Activation Begins During Speech Planning and Extends Into Second Language Speech
    Jacobs, April
    Fricke, Melinda
    Kroll, Judith F.
    LANGUAGE LEARNING, 2016, 66 (02) : 324 - 353