Speaker dependent articulatory-to-acoustic mapping using real-time MRI of the vocal tract

被引：4

作者：

Csapo, Tamas Gabor ^{[1
,2
]}

机构：

[1] Budapest Univ Technol & Econ, Dept Telecommun & Media Informat, Budapest, Hungary

[2] MTA ELTE Lendulet Lingual Articulat Res Grp, Budapest, Hungary

来源：

INTERSPEECH 2020 | 2020年

关键词：

magnetic resonance imaging; articulatory-to-acoustic mapping; vocal tract; deep neural network; SPEECH RECOGNITION; DATABASE;

D O I：

10.21437/Interspeech.2020-15

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Articulatory-to-acoustic (forward) mapping is a technique to predict speech using various articulatory acquisition techniques (e.g. ultrasound tongue imaging, lip video). Real-time MRI (rtMRI) of the vocal tract has not been used before for this purpose. The advantage of MRI is that it has a high 'relative' spatial resolution: it can capture not only lingual, labial and jaw motion, but also the velum and the pharyngeal region, which is typically not possible with other techniques. In the current paper, we train various DNNs (fully connected, convolutional and recurrent neural networks) for articulatory-to-speech conversion, using rtMRI as input, in a speaker-specific way. We use two male and two female speakers of the USC-TIMIT articulatory database, each of them uttering 460 sentences. We evaluate the results with objective (Normalized MSE and MCD) and subjective measures (perceptual test) and show that CNN-LSTM networks are preferred which take multiple images as input, and achieve MCD scores between 2.8-4.5 dB. In the experiments, we find that the predictions of speaker 'm1' are significantly weaker than other speakers. We show that this is caused by the fact that 74% of the recordings of speaker 'm1' are out of sync.

引用

页码：2722 / 2726

页数：5

共 50 条

[1] Speaker dependent acoustic-to-articulatory inversion using real-time MRI of the vocal tract
Csapo, Tamas Gabor
INTERSPEECH 2020, 2020, : 3720 - 3724
[2] Using Transposed Convolution for Articulatory-to-Acoustic Conversion from Real-Time MRI Data
Tanji, Ryo
Ohmura, Hidefumi
Katsurada, Kouichi
INTERSPEECH 2021, 2021, : 3176 - 3180
[3] DETERMINATION OF THE VOCAL-TRACT SHAPE FROM THE FORMANTS BY ANALYSIS OF THE ARTICULATORY-TO-ACOUSTIC NONLINEARITIES
CHARPENTIER, F
SPEECH COMMUNICATION, 1984, 3 (04) : 291 - 308
[4] INVERSION OF ARTICULATORY-TO-ACOUSTIC TRANSFORMATION IN VOCAL-TRACT BY A COMPUTER-SORTING TECHNIQUE
ATAL, BS
CHANG, JJ
MATHEWS, MV
TUKEY, JW
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1978, 63 (05): : 1535 - 1555
[5] Towards Speech Classification from Acoustic and Vocal Tract data in Real-time MRI
Yue, Yaoyao
Proctor, Michael
Zhou, Luping
Gupta, Rijul
Piyadasa, Tharinda
Gully, Amelia
Ballard, Kirrie
Tin, Craig
INTERSPEECH 2024, 2024, : 1345 - 1349
[6] Characterizing vocal tract dynamics across speakers using real-time MRI
Sorensen, Tanner
Toutios, Asterios
Goldstein, Louis
Narayanan, Shrikanth
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 465 - 469
[7] Vocal tract length normalization for speaker independent acoustic-to-articulatory speech inversion
Sivaraman, Ganesh
Mitra, Vikramjit
Nam, Hosung
Tiede, Mark
Espy-Wilson, Carol
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 455 - 459
[8] Case study: Real-time MRI articulatory comparison of a congenital aglossic and normal speaker
McMicken, B.
MOVEMENT DISORDERS, 2017, 32
[9] An Articulatory Analysis of Phonological Transfer Using Real-Time MRI
Tepperman, Joseph
Bresch, Erik
Kim, Yoon-Chul
Lee, Sungbok
Goldstein, Louis
Narayanan, Shrikanth
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 688 - 691
[10] Database of volumetric and real-time vocal tract MRI for speech science
Sorensen, Tanner
Skordilis, Zisis
Toutios, Asterios
Kim, Yoon-Chul
Zhu, Yinghua
Kim, Jangwon
Lammert, Adam
Ramanarayanan, Vikram
Goldstein, Louis
Byrd, Dani
Nayak, Krishna
Narayanan, Shrikanth
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 645 - 649

← 1 2 3 4 5 →