Visual-to-Speech Conversion Based on Maximum Likelihood Estimation

被引:0
|
作者
Ra, Rina [1 ]
Aihara, Ryo [1 ]
Takiguchi, Tesuya [1 ]
Ariki, Yasuo [1 ]
机构
[1] Kobe Univ, Grad Sch Syst Informat, Nada Ku, 1-1 Rokkodai, Kobe, Hyogo, Japan
关键词
VOICE CONVERSION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a visual-to-speech conversion method that converts voiceless lip movements into voiced utterances without recognizing text information. Inspired by a Gaussian Mixture Model (GMM)-based voice conversion method, GMM is estimated from jointed visual and audio features and input visual features are converted to audio features using maximum likelihood estimation. In order to capture lip movements whose frame rate data is smaller than the audio data, we construct long-term image features. The proposed method has been evaluated using large-vocabulary continuous speech and experimental results show that our proposed method effectively estimates spectral envelopes and fundamental frequencies of audio speech from voiceless lip movements.
引用
收藏
页码:518 / 521
页数:4
相关论文
共 50 条
  • [1] Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory
    Toda, Tomoki
    Black, Alan W.
    Tokuda, Keiichi
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (08): : 2222 - 2235
  • [2] Speech recognizer based maximum likelihood beamforming
    Raj, B
    Seltzer, M
    Reyes-Gomez, MJ
    SPEECH SEPARATION BY HUMANS AND MACHINES, 2005, : 65 - 82
  • [3] Simultaneous estimation based on empirical likelihood and general maximum likelihood estimation
    Park, Junyong
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2018, 117 : 19 - 31
  • [4] A Variational Approach to Robust Maximum Likelihood Estimation for Speech Recognition
    Omar, Mohamed Kamal
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1049 - 1052
  • [5] Maximum Likelihood PSD Estimation for Speech Enhancement in Reverberation and Noise
    Kuklasinski, Adam
    Doclo, Simon
    Jensen, Soren Holdt
    Jensen, Jesper
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (09) : 1599 - 1612
  • [6] Spectral conversion based on maximum likelihood estimation considering global variance of converted parameter
    Toda, T
    Black, AW
    Tokuda, K
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 9 - 12
  • [7] GPS multipath estimation based on maximum likelihood estimation
    Liu, Ya-Huan
    Tian, Yu
    Li, Guo-Tong
    Yuhang Xuebao/Journal of Astronautics, 2009, 30 (04): : 1466 - 1471
  • [8] MAXIMUM LIKELIHOOD BASED NOISE COVARIANCE MATRIX ESTIMATION FOR MULTI-MICROPHONE SPEECH ENHANCEMENT
    Kjems, Ulrik
    Jensen, Jesper
    2012 PROCEEDINGS OF THE 20TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2012, : 295 - 299
  • [9] Probabilistic Modeling of Speech in Spectral Domain using Maximum Likelihood Estimation
    Usman, Mohammed
    Zubair, Mohammed
    Shiblee, Mohammad
    Rodrigues, Paul
    Jaffar, Syed
    SYMMETRY-BASEL, 2018, 10 (12):
  • [10] Maximum likelihood joint estimation of channel and noise for robust speech recognition
    Zhao, YX
    2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1109 - 1112