Speaker dependent articulatory-to-acoustic mapping using real-time MRI of the vocal tract

被引:4
|
作者
Csapo, Tamas Gabor [1 ,2 ]
机构
[1] Budapest Univ Technol & Econ, Dept Telecommun & Media Informat, Budapest, Hungary
[2] MTA ELTE Lendulet Lingual Articulat Res Grp, Budapest, Hungary
来源
关键词
magnetic resonance imaging; articulatory-to-acoustic mapping; vocal tract; deep neural network; SPEECH RECOGNITION; DATABASE;
D O I
10.21437/Interspeech.2020-15
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Articulatory-to-acoustic (forward) mapping is a technique to predict speech using various articulatory acquisition techniques (e.g. ultrasound tongue imaging, lip video). Real-time MRI (rtMRI) of the vocal tract has not been used before for this purpose. The advantage of MRI is that it has a high 'relative' spatial resolution: it can capture not only lingual, labial and jaw motion, but also the velum and the pharyngeal region, which is typically not possible with other techniques. In the current paper, we train various DNNs (fully connected, convolutional and recurrent neural networks) for articulatory-to-speech conversion, using rtMRI as input, in a speaker-specific way. We use two male and two female speakers of the USC-TIMIT articulatory database, each of them uttering 460 sentences. We evaluate the results with objective (Normalized MSE and MCD) and subjective measures (perceptual test) and show that CNN-LSTM networks are preferred which take multiple images as input, and achieve MCD scores between 2.8-4.5 dB. In the experiments, we find that the predictions of speaker 'm1' are significantly weaker than other speakers. We show that this is caused by the fact that 74% of the recordings of speaker 'm1' are out of sync.
引用
收藏
页码:2722 / 2726
页数:5
相关论文
共 50 条
  • [31] Motion detection of articulatory movement with paralinguistic information using real-time MRI movie
    Asai, Takuya
    Kikuchi, Hideaki
    Maekawa, Kikuo
    2019 22ND CONFERENCE OF THE ORIENTAL COCOSDA INTERNATIONAL COMMITTEE FOR THE CO-ORDINATION AND STANDARDISATION OF SPEECH DATABASES AND ASSESSMENT TECHNIQUES (O-COCOSDA), 2019, : 96 - 101
  • [32] Articulatory-acoustic relationships during vocal tract growth for French vowels:: Analysis of real data and simulations with an articulatory model
    Menard, Lucie
    Schwartz, Jean-Luc
    Boe, Louis-Jean
    Aubin, Jerome
    JOURNAL OF PHONETICS, 2007, 35 (01) : 1 - 19
  • [33] Towards Automatic Speech Identification from Vocal Tract Shape Dynamics in Real-time MRI
    Saha, Pramit
    Srungarapu, Praneeth
    Fels, Sidney
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1249 - 1253
  • [34] Silent Speech and Emotion Recognition from Vocal Tract Shape Dynamics in Real-Time MRI
    Pandey, Laxmi
    Arif, Ahmed Sabbir
    SIGGRAPH '21: ACM SIGGRAPH 2021 POSTERS, 2021,
  • [35] Generating high-resolution 3D real-time MRI of the vocal tract
    Strauch, Martin
    Serrurier, Antoine
    INTERSPEECH 2023, 2023, : 5142 - 5146
  • [36] Improved Depiction of Tissue Boundaries in Vocal Tract Real-time MRI using Automatic Off-resonance Correction
    Lim, Yongwan
    Lingala, Sajan Goud
    Toutios, Asterios
    Narayanan, Shrikanth
    Nayak, Krishna S.
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1765 - 1769
  • [37] Vocal Tract Cross-Distance Estimation from Real-Time MRI using Region-of-Interest Analysis
    Lammert, Adam
    Ramanarayanan, Vikram
    Proctor, Michael
    Narayanan, Shrikanth
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 959 - 962
  • [38] A Multimodal Real-Time MRI Articulatory Corpus of French for Speech Research
    Douros, Ioannis K.
    Felblinger, Jacques
    Frahm, Jens
    Isaieva, Karyna
    Joseph, Arun A.
    Laprie, Yves
    Odille, Freddy
    Tsukanova, Anastasiia
    Voit, Dirk
    Vuissoz, Pierre-Andre
    INTERSPEECH 2019, 2019, : 1556 - 1560
  • [39] Speech Synthesis from Articulatory Movements Recorded by Real-time MRI
    Otani, Yuto
    Sawada, Shun
    Ohmura, Hidefumi
    Katsurada, Kouichi
    INTERSPEECH 2023, 2023, : 127 - 131
  • [40] A Real-Time MRI Study of Articulatory Setting in Second Language Speech
    Benitez, Andres
    Ramanarayanan, Vikram
    Goldstein, Louis
    Narayanan, Shrikanth
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 701 - 705