Visual-to-Speech Conversion Based on Maximum Likelihood Estimation

被引：0

作者：

Ra, Rina ^{[1
]}

Aihara, Ryo ^{[1
]}

Takiguchi, Tesuya ^{[1
]}

Ariki, Yasuo ^{[1
]}

机构：

[1] Kobe Univ, Grad Sch Syst Informat, Nada Ku, 1-1 Rokkodai, Kobe, Hyogo, Japan

来源：

PROCEEDINGS OF THE FIFTEENTH IAPR INTERNATIONAL CONFERENCE ON MACHINE VISION APPLICATIONS - MVA2017 | 2017年

关键词：

VOICE CONVERSION;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper proposes a visual-to-speech conversion method that converts voiceless lip movements into voiced utterances without recognizing text information. Inspired by a Gaussian Mixture Model (GMM)-based voice conversion method, GMM is estimated from jointed visual and audio features and input visual features are converted to audio features using maximum likelihood estimation. In order to capture lip movements whose frame rate data is smaller than the audio data, we construct long-term image features. The proposed method has been evaluated using large-vocabulary continuous speech and experimental results show that our proposed method effectively estimates spectral envelopes and fundamental frequencies of audio speech from voiceless lip movements.

引用

页码：518 / 521

页数：4

共 50 条

[1] Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory
Toda, Tomoki
Black, Alan W.
Tokuda, Keiichi
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (08): : 2222 - 2235
[2] Speech recognizer based maximum likelihood beamforming
Raj, B
Seltzer, M
Reyes-Gomez, MJ
SPEECH SEPARATION BY HUMANS AND MACHINES, 2005, : 65 - 82
[3] Simultaneous estimation based on empirical likelihood and general maximum likelihood estimation
Park, Junyong
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2018, 117 : 19 - 31
[4] A Variational Approach to Robust Maximum Likelihood Estimation for Speech Recognition
Omar, Mohamed Kamal
INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1049 - 1052
[5] Maximum Likelihood PSD Estimation for Speech Enhancement in Reverberation and Noise
Kuklasinski, Adam
Doclo, Simon
Jensen, Soren Holdt
Jensen, Jesper
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (09) : 1599 - 1612
[6] Spectral conversion based on maximum likelihood estimation considering global variance of converted parameter
Toda, T
Black, AW
Tokuda, K
2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 9 - 12
[7] GPS multipath estimation based on maximum likelihood estimation
Liu, Ya-Huan
Tian, Yu
Li, Guo-Tong
Yuhang Xuebao/Journal of Astronautics, 2009, 30 (04): : 1466 - 1471
[8] MAXIMUM LIKELIHOOD BASED NOISE COVARIANCE MATRIX ESTIMATION FOR MULTI-MICROPHONE SPEECH ENHANCEMENT
Kjems, Ulrik
Jensen, Jesper
2012 PROCEEDINGS OF THE 20TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2012, : 295 - 299
[9] Probabilistic Modeling of Speech in Spectral Domain using Maximum Likelihood Estimation
Usman, Mohammed
Zubair, Mohammed
Shiblee, Mohammad
Rodrigues, Paul
Jaffar, Syed
SYMMETRY-BASEL, 2018, 10 (12):
[10] Maximum likelihood joint estimation of channel and noise for robust speech recognition
Zhao, YX
2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1109 - 1112

← 1 2 3 4 5 →