Reconstructing Speech from Real-Time Articulatory MRI Using Neural Vocoders

被引：0

作者：

Yu, Yide ^{[1
]}

Shandiz, Amin Honarmandi ^{[1
]}

Toth, Laszlo ^{[1
]}

机构：

[1] Univ Szeged, Inst Informat, Szeged, Hungary

来源：

29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021) | 2021年

关键词：

Real-Time MRI; articulatory-to-acoustic mapping; deep learning; RECOGNITION; ARTICULOGRAPHY; SYSTEM;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Several approaches exist for the recording of articulatory movements, such as eletromagnetic and permanent magnetic articulagraphy, ultrasound tongue imaging and surface electromyography. Although magnetic resonance imaging (MRI) is more costly than the above approaches, the recent developments in this area now allow the recording of real-time MRI videos of the articulators with an acceptable resolution. Here, we experiment with the reconstruction of the speech signal from a real-time MRI recording using deep neural networks. Instead of estimating speech directly, our networks are trained to output a spectral vector, from which we reconstruct the speech signal using the WaveGlow neural vocoder. We compare the performance of three deep neural architectures for the estimation task, combining convolutional (CNN) and recurrence-based (LSTM) neural layers. Besides the mean absolute error (MAE) of our networks, we also evaluate our models by comparing the speech signals obtained using several objective speech quality metrics like the mean cepstral distortion (MCD), Short-Time Objective Intelligibility (STOI), Perceptual Evaluation of Speech Quality (PESQ) and Signal-to-Distortion Ratio (SDR). The results indicate that our approach can successfully reconstruct the gross spectral shape, but more improvements are needed to reproduce the fine spectral details.

引用

页码：945 / 949

页数：5

共 50 条

[1] Real-time MRI and articulatory coordination in speech
Demolin, D
Hassid, S
Metens, T
Soquet, A
COMPTES RENDUS BIOLOGIES, 2002, 325 (04) : 547 - 556
[2] Speech Synthesis from Articulatory Movements Recorded by Real-time MRI
Otani, Yuto
Sawada, Shun
Ohmura, Hidefumi
Katsurada, Kouichi
INTERSPEECH 2023, 2023, : 127 - 131
[3] A Multimodal Real-Time MRI Articulatory Corpus for Speech Research
Narayanan, Shrikanth
Bresch, Erik
Ghosh, Prasanta
Goldstein, Louis
Katsamanis, Athanasios
Kim, Yoon
Lammert, Adam
Proctor, Michael
Ramanarayanan, Vikram
Zhu, Yinghua
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 844 - +
[4] A Multimodal Real-Time MRI Articulatory Corpus of French for Speech Research
Douros, Ioannis K.
Felblinger, Jacques
Frahm, Jens
Isaieva, Karyna
Joseph, Arun A.
Laprie, Yves
Odille, Freddy
Tsukanova, Anastasiia
Voit, Dirk
Vuissoz, Pierre-Andre
INTERSPEECH 2019, 2019, : 1556 - 1560
[5] A Real-Time MRI Study of Articulatory Setting in Second Language Speech
Benitez, Andres
Ramanarayanan, Vikram
Goldstein, Louis
Narayanan, Shrikanth
15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 701 - 705
[6] SENSORIMOTOR ADAPTATION OF SPEECH USING REAL-TIME ARTICULATORY RESYNTHESIS
Berry, Jeff
North, Cassandra
Johnson, Michael T.
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[7] An Articulatory Analysis of Phonological Transfer Using Real-Time MRI
Tepperman, Joseph
Bresch, Erik
Kim, Yoon-Chul
Lee, Sungbok
Goldstein, Louis
Narayanan, Shrikanth
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 688 - 691
[8] Statistical multi-stream modeling of real-time MRI articulatory speech data
Bresch, Erik
Katsamanis, Athanasios
Goldstein, Louis
Narayanan, Shrikanth
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 1584 - +
[9] Using Transposed Convolution for Articulatory-to-Acoustic Conversion from Real-Time MRI Data
Tanji, Ryo
Ohmura, Hidefumi
Katsurada, Kouichi
INTERSPEECH 2021, 2021, : 3176 - 3180
[10] Real-time MRI articulatory movement database and its application to articulatory phonetics
Maekawa, Kikuo
Acoustical Science and Technology, 46 (01): : 45 - 54

← 1 2 3 4 5 →