Reconstructing Speech from Real-Time Articulatory MRI Using Neural Vocoders

被引:0
|
作者
Yu, Yide [1 ]
Shandiz, Amin Honarmandi [1 ]
Toth, Laszlo [1 ]
机构
[1] Univ Szeged, Inst Informat, Szeged, Hungary
关键词
Real-Time MRI; articulatory-to-acoustic mapping; deep learning; RECOGNITION; ARTICULOGRAPHY; SYSTEM;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Several approaches exist for the recording of articulatory movements, such as eletromagnetic and permanent magnetic articulagraphy, ultrasound tongue imaging and surface electromyography. Although magnetic resonance imaging (MRI) is more costly than the above approaches, the recent developments in this area now allow the recording of real-time MRI videos of the articulators with an acceptable resolution. Here, we experiment with the reconstruction of the speech signal from a real-time MRI recording using deep neural networks. Instead of estimating speech directly, our networks are trained to output a spectral vector, from which we reconstruct the speech signal using the WaveGlow neural vocoder. We compare the performance of three deep neural architectures for the estimation task, combining convolutional (CNN) and recurrence-based (LSTM) neural layers. Besides the mean absolute error (MAE) of our networks, we also evaluate our models by comparing the speech signals obtained using several objective speech quality metrics like the mean cepstral distortion (MCD), Short-Time Objective Intelligibility (STOI), Perceptual Evaluation of Speech Quality (PESQ) and Signal-to-Distortion Ratio (SDR). The results indicate that our approach can successfully reconstruct the gross spectral shape, but more improvements are needed to reproduce the fine spectral details.
引用
收藏
页码:945 / 949
页数:5
相关论文
共 50 条
  • [1] Real-time MRI and articulatory coordination in speech
    Demolin, D
    Hassid, S
    Metens, T
    Soquet, A
    COMPTES RENDUS BIOLOGIES, 2002, 325 (04) : 547 - 556
  • [2] Speech Synthesis from Articulatory Movements Recorded by Real-time MRI
    Otani, Yuto
    Sawada, Shun
    Ohmura, Hidefumi
    Katsurada, Kouichi
    INTERSPEECH 2023, 2023, : 127 - 131
  • [3] A Multimodal Real-Time MRI Articulatory Corpus for Speech Research
    Narayanan, Shrikanth
    Bresch, Erik
    Ghosh, Prasanta
    Goldstein, Louis
    Katsamanis, Athanasios
    Kim, Yoon
    Lammert, Adam
    Proctor, Michael
    Ramanarayanan, Vikram
    Zhu, Yinghua
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 844 - +
  • [4] A Multimodal Real-Time MRI Articulatory Corpus of French for Speech Research
    Douros, Ioannis K.
    Felblinger, Jacques
    Frahm, Jens
    Isaieva, Karyna
    Joseph, Arun A.
    Laprie, Yves
    Odille, Freddy
    Tsukanova, Anastasiia
    Voit, Dirk
    Vuissoz, Pierre-Andre
    INTERSPEECH 2019, 2019, : 1556 - 1560
  • [5] A Real-Time MRI Study of Articulatory Setting in Second Language Speech
    Benitez, Andres
    Ramanarayanan, Vikram
    Goldstein, Louis
    Narayanan, Shrikanth
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 701 - 705
  • [6] SENSORIMOTOR ADAPTATION OF SPEECH USING REAL-TIME ARTICULATORY RESYNTHESIS
    Berry, Jeff
    North, Cassandra
    Johnson, Michael T.
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [7] An Articulatory Analysis of Phonological Transfer Using Real-Time MRI
    Tepperman, Joseph
    Bresch, Erik
    Kim, Yoon-Chul
    Lee, Sungbok
    Goldstein, Louis
    Narayanan, Shrikanth
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 688 - 691
  • [8] Statistical multi-stream modeling of real-time MRI articulatory speech data
    Bresch, Erik
    Katsamanis, Athanasios
    Goldstein, Louis
    Narayanan, Shrikanth
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 1584 - +
  • [9] Using Transposed Convolution for Articulatory-to-Acoustic Conversion from Real-Time MRI Data
    Tanji, Ryo
    Ohmura, Hidefumi
    Katsurada, Kouichi
    INTERSPEECH 2021, 2021, : 3176 - 3180
  • [10] Real-time MRI articulatory movement database and its application to articulatory phonetics
    Maekawa, Kikuo
    Acoustical Science and Technology, 46 (01): : 45 - 54