Speaker dependent articulatory-to-acoustic mapping using real-time MRI of the vocal tract

被引：4

作者：

Csapo, Tamas Gabor ^{[1
,2
]}

机构：

[1] Budapest Univ Technol & Econ, Dept Telecommun & Media Informat, Budapest, Hungary

[2] MTA ELTE Lendulet Lingual Articulat Res Grp, Budapest, Hungary

来源：

INTERSPEECH 2020 | 2020年

关键词：

magnetic resonance imaging; articulatory-to-acoustic mapping; vocal tract; deep neural network; SPEECH RECOGNITION; DATABASE;

D O I：

10.21437/Interspeech.2020-15

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Articulatory-to-acoustic (forward) mapping is a technique to predict speech using various articulatory acquisition techniques (e.g. ultrasound tongue imaging, lip video). Real-time MRI (rtMRI) of the vocal tract has not been used before for this purpose. The advantage of MRI is that it has a high 'relative' spatial resolution: it can capture not only lingual, labial and jaw motion, but also the velum and the pharyngeal region, which is typically not possible with other techniques. In the current paper, we train various DNNs (fully connected, convolutional and recurrent neural networks) for articulatory-to-speech conversion, using rtMRI as input, in a speaker-specific way. We use two male and two female speakers of the USC-TIMIT articulatory database, each of them uttering 460 sentences. We evaluate the results with objective (Normalized MSE and MCD) and subjective measures (perceptual test) and show that CNN-LSTM networks are preferred which take multiple images as input, and achieve MCD scores between 2.8-4.5 dB. In the experiments, we find that the predictions of speaker 'm1' are significantly weaker than other speakers. We show that this is caused by the fact that 74% of the recordings of speaker 'm1' are out of sync.

引用

页码：2722 / 2726

页数：5

共 50 条

[31] Motion detection of articulatory movement with paralinguistic information using real-time MRI movie
Asai, Takuya
Kikuchi, Hideaki
Maekawa, Kikuo
2019 22ND CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA), 2019, : 96 - 101
[32] Articulatory-acoustic relationships during vocal tract growth for French vowels:: Analysis of real data and simulations with an articulatory model
Menard, Lucie
Schwartz, Jean-Luc
Boe, Louis-Jean
Aubin, Jerome
JOURNAL OF PHONETICS, 2007, 35 (01) : 1 - 19
[33] Towards Automatic Speech Identification from Vocal Tract Shape Dynamics in Real-time MRI
Saha, Pramit
Srungarapu, Praneeth
Fels, Sidney
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1249 - 1253
[34] Silent Speech and Emotion Recognition from Vocal Tract Shape Dynamics in Real-Time MRI
Pandey, Laxmi
Arif, Ahmed Sabbir
SIGGRAPH '21: ACM SIGGRAPH 2021 POSTERS, 2021,
[35] Generating high-resolution 3D real-time MRI of the vocal tract
Strauch, Martin
Serrurier, Antoine
INTERSPEECH 2023, 2023, : 5142 - 5146
[36] Improved Depiction of Tissue Boundaries in Vocal Tract Real-time MRI using Automatic Off-resonance Correction
Lim, Yongwan
Lingala, Sajan Goud
Toutios, Asterios
Narayanan, Shrikanth
Nayak, Krishna S.
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1765 - 1769
[37] Vocal Tract Cross-Distance Estimation from Real-Time MRI using Region-of-Interest Analysis
Lammert, Adam
Ramanarayanan, Vikram
Proctor, Michael
Narayanan, Shrikanth
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 959 - 962
[38] A Multimodal Real-Time MRI Articulatory Corpus of French for Speech Research
Douros, Ioannis K.
Felblinger, Jacques
Frahm, Jens
Isaieva, Karyna
Joseph, Arun A.
Laprie, Yves
Odille, Freddy
Tsukanova, Anastasiia
Voit, Dirk
Vuissoz, Pierre-Andre
INTERSPEECH 2019, 2019, : 1556 - 1560
[39] Speech Synthesis from Articulatory Movements Recorded by Real-time MRI
Otani, Yuto
Sawada, Shun
Ohmura, Hidefumi
Katsurada, Kouichi
INTERSPEECH 2023, 2023, : 127 - 131
[40] A Real-Time MRI Study of Articulatory Setting in Second Language Speech
Benitez, Andres
Ramanarayanan, Vikram
Goldstein, Louis
Narayanan, Shrikanth
15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 701 - 705

← 1 2 3 4 5 →