Speaker dependent articulatory-to-acoustic mapping using real-time MRI of the vocal tract

被引:4
|
作者
Csapo, Tamas Gabor [1 ,2 ]
机构
[1] Budapest Univ Technol & Econ, Dept Telecommun & Media Informat, Budapest, Hungary
[2] MTA ELTE Lendulet Lingual Articulat Res Grp, Budapest, Hungary
来源
关键词
magnetic resonance imaging; articulatory-to-acoustic mapping; vocal tract; deep neural network; SPEECH RECOGNITION; DATABASE;
D O I
10.21437/Interspeech.2020-15
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Articulatory-to-acoustic (forward) mapping is a technique to predict speech using various articulatory acquisition techniques (e.g. ultrasound tongue imaging, lip video). Real-time MRI (rtMRI) of the vocal tract has not been used before for this purpose. The advantage of MRI is that it has a high 'relative' spatial resolution: it can capture not only lingual, labial and jaw motion, but also the velum and the pharyngeal region, which is typically not possible with other techniques. In the current paper, we train various DNNs (fully connected, convolutional and recurrent neural networks) for articulatory-to-speech conversion, using rtMRI as input, in a speaker-specific way. We use two male and two female speakers of the USC-TIMIT articulatory database, each of them uttering 460 sentences. We evaluate the results with objective (Normalized MSE and MCD) and subjective measures (perceptual test) and show that CNN-LSTM networks are preferred which take multiple images as input, and achieve MCD scores between 2.8-4.5 dB. In the experiments, we find that the predictions of speaker 'm1' are significantly weaker than other speakers. We show that this is caused by the fact that 74% of the recordings of speaker 'm1' are out of sync.
引用
收藏
页码:2722 / 2726
页数:5
相关论文
共 50 条
  • [1] Speaker dependent acoustic-to-articulatory inversion using real-time MRI of the vocal tract
    Csapo, Tamas Gabor
    INTERSPEECH 2020, 2020, : 3720 - 3724
  • [2] Using Transposed Convolution for Articulatory-to-Acoustic Conversion from Real-Time MRI Data
    Tanji, Ryo
    Ohmura, Hidefumi
    Katsurada, Kouichi
    INTERSPEECH 2021, 2021, : 3176 - 3180
  • [4] INVERSION OF ARTICULATORY-TO-ACOUSTIC TRANSFORMATION IN VOCAL-TRACT BY A COMPUTER-SORTING TECHNIQUE
    ATAL, BS
    CHANG, JJ
    MATHEWS, MV
    TUKEY, JW
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1978, 63 (05): : 1535 - 1555
  • [5] Towards Speech Classification from Acoustic and Vocal Tract data in Real-time MRI
    Yue, Yaoyao
    Proctor, Michael
    Zhou, Luping
    Gupta, Rijul
    Piyadasa, Tharinda
    Gully, Amelia
    Ballard, Kirrie
    Tin, Craig
    INTERSPEECH 2024, 2024, : 1345 - 1349
  • [6] Characterizing vocal tract dynamics across speakers using real-time MRI
    Sorensen, Tanner
    Toutios, Asterios
    Goldstein, Louis
    Narayanan, Shrikanth
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 465 - 469
  • [7] Vocal tract length normalization for speaker independent acoustic-to-articulatory speech inversion
    Sivaraman, Ganesh
    Mitra, Vikramjit
    Nam, Hosung
    Tiede, Mark
    Espy-Wilson, Carol
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 455 - 459
  • [9] An Articulatory Analysis of Phonological Transfer Using Real-Time MRI
    Tepperman, Joseph
    Bresch, Erik
    Kim, Yoon-Chul
    Lee, Sungbok
    Goldstein, Louis
    Narayanan, Shrikanth
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 688 - 691
  • [10] Database of volumetric and real-time vocal tract MRI for speech science
    Sorensen, Tanner
    Skordilis, Zisis
    Toutios, Asterios
    Kim, Yoon-Chul
    Zhu, Yinghua
    Kim, Jangwon
    Lammert, Adam
    Ramanarayanan, Vikram
    Goldstein, Louis
    Byrd, Dani
    Nayak, Krishna
    Narayanan, Shrikanth
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 645 - 649