Speaker dependent articulatory-to-acoustic mapping using real-time MRI of the vocal tract

被引:4
|
作者
Csapo, Tamas Gabor [1 ,2 ]
机构
[1] Budapest Univ Technol & Econ, Dept Telecommun & Media Informat, Budapest, Hungary
[2] MTA ELTE Lendulet Lingual Articulat Res Grp, Budapest, Hungary
来源
关键词
magnetic resonance imaging; articulatory-to-acoustic mapping; vocal tract; deep neural network; SPEECH RECOGNITION; DATABASE;
D O I
10.21437/Interspeech.2020-15
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Articulatory-to-acoustic (forward) mapping is a technique to predict speech using various articulatory acquisition techniques (e.g. ultrasound tongue imaging, lip video). Real-time MRI (rtMRI) of the vocal tract has not been used before for this purpose. The advantage of MRI is that it has a high 'relative' spatial resolution: it can capture not only lingual, labial and jaw motion, but also the velum and the pharyngeal region, which is typically not possible with other techniques. In the current paper, we train various DNNs (fully connected, convolutional and recurrent neural networks) for articulatory-to-speech conversion, using rtMRI as input, in a speaker-specific way. We use two male and two female speakers of the USC-TIMIT articulatory database, each of them uttering 460 sentences. We evaluate the results with objective (Normalized MSE and MCD) and subjective measures (perceptual test) and show that CNN-LSTM networks are preferred which take multiple images as input, and achieve MCD scores between 2.8-4.5 dB. In the experiments, we find that the predictions of speaker 'm1' are significantly weaker than other speakers. We show that this is caused by the fact that 74% of the recordings of speaker 'm1' are out of sync.
引用
收藏
页码:2722 / 2726
页数:5
相关论文
共 50 条
  • [41] Real-Time Speaker Identification Using Speaker Model Distance
    Zeinali, Hossein
    Sameti, Hossein
    Hadian, Hossein
    2015 23RD IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE), 2015, : 643 - 647
  • [42] A system for real-time cardiac acoustic mapping
    Leong-Kon, D
    Durand, LG
    Durand, J
    Lee, H
    PROCEEDINGS OF THE 20TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOL 20, PTS 1-6: BIOMEDICAL ENGINEERING TOWARDS THE YEAR 2000 AND BEYOND, 1998, 20 : 17 - 20
  • [43] CONCATENATIVE ARTICULATORY VIDEO SYNTHESIS USING REAL-TIME MRI DATA FOR SPOKEN LANGUAGE TRAINING
    Desai, Urvish
    Yarra, Chiranjeevi
    Ghosh, Prasanta Kumar
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4999 - 5003
  • [44] Low latency real-time vocal tract length normalization
    Ljolje, A
    Goffin, V
    Saraclar, M
    TEXT, SPEECH AND DIALOGUE, PROCEEDINGS, 2004, 3206 : 371 - 378
  • [45] Real-Time Passive Acoustic Mapping Using Sparse Matrix Multiplication
    Kamimura, Hermes A. S.
    Wu, Shih-Ying
    Grondin, Julien
    Ji, Robin
    Aurup, Christian
    Zheng, Wenlan
    Heidmann, Marc
    Pouliopoulos, Antonios N.
    Konofagou, Elisa E.
    IEEE TRANSACTIONS ON ULTRASONICS FERROELECTRICS AND FREQUENCY CONTROL, 2021, 68 (01) : 164 - 177
  • [46] Whistling shares a common tongue with speech: bioacoustics from real-time MRI of the human vocal tract
    Belyk, Michel
    Schultz, Benjamin G.
    Correia, Joao
    Beal, Deryk S.
    Kotz, Sonja A.
    PROCEEDINGS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES, 2019, 286 (1911)
  • [47] Vocal tract and register changes analysed by real-time MRI in male professional singers - a pilot study
    Echternach, Matthias
    Sundberg, Johan
    Arndt, Susan
    Breyer, Tobias
    Markl, Michael
    Schumacher, Martin
    Richter, Bernhard
    LOGOPEDICS PHONIATRICS VOCOLOGY, 2008, 33 (02) : 67 - 73
  • [48] Electroanatomic substrate mapping of the left ventricle using real-time MRI
    Dukkipati, S
    Schmidt, E
    Holmvang, G
    Gudhe, R
    Darrow, RD
    Slavin, G
    Fung, M
    Mallozi, R
    Dumoulin, CL
    Malchano, ZJ
    Kampa, G
    Dando, JD
    Christina, M
    Foo, TK
    Ruskin, JN
    Reddy, VY
    CIRCULATION, 2005, 112 (17) : U707 - U707
  • [49] Gestural Control in the English Past-Tense Suffix: An Articulatory Study Using Real-Time MRI
    Lammert, Adam
    Goldstein, Louis
    Ramanarayanan, Vikram
    Narayanan, Shrikanth
    PHONETICA, 2014, 71 (04) : 229 - 248
  • [50] Implementation of a Real-Time Text Dependent Speaker Identification System
    Andrei, Valentin
    Paleologu, Constantin
    Burileanu, Corneliu
    2011 6TH CONFERENCE ON SPEECH TECHNOLOGY AND HUMAN-COMPUTER DIALOGUE (SPED), 2011,