Separation of Audio-Visual Speech Sources: A New Approach Exploiting the Audio-Visual Coherence of Speech Stimuli

被引:0
|
作者
David Sodoyer
Jean-Luc Schwartz
Laurent Girin
Jacob Klinkisch
Christian Jutten
机构
[1] Université Stendhal,Institut de la Communication Parlée, Institut National Polytechnique de Grenoble
关键词
blind source separation; lipreading; audio-visual speech processing;
D O I
暂无
中图分类号
学科分类号
摘要
We present a new approach to the source separation problem in the case of multiple speech signals. The method is based on the use of automatic lipreading, the objective is to extract an acoustic speech signal from other acoustic signals by exploiting its coherence with the speaker′s lip movements. We consider the case of an additive stationary mixture of decorrelated sources, with no further assumptions on independence or non-Gaussian character. Firstly, we present a theoretical framework showing that it is indeed possible to separate a source when some of its spectral characteristics are provided to the system. Then we address the case of audio-visual sources. We show how, if a statistical model of the joint probability of visual and spectral audio input is learnt to quantify the audio-visual coherence, separation can be achieved by maximizing this probability. Finally, we present a number of separation results on a corpus of vowel-plosive-vowel sequences uttered by a single speaker, embedded in a mixture of other voices. We show that separation can be quite good for mixtures of 2, 3, and 5 sources. These results, while very preliminary, are encouraging, and are discussed in respect to their potential complementarity with traditional pure audio separation or enhancement techniques.
引用
收藏
相关论文
共 50 条
  • [41] Deep audio-visual speech separation based on facial motion
    Rigal, Remi
    Chodorowski, Jacques
    Zerr, Benoit
    INTERSPEECH 2021, 2021, : 3540 - 3544
  • [42] DEEP VARIATIONAL GENERATIVE MODELS FOR AUDIO-VISUAL SPEECH SEPARATION
    Viet-Nhat Nguyen
    Sadeghi, Mostafa
    Ricci, Elisa
    Alameda-Pineda, Xavier
    2021 IEEE 31ST INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2021,
  • [43] Audio-Visual Speech Separation Using I-Vectors
    Luo, Yiyu
    Wang, Jing
    Wang, Xinyao
    Wen, Liang
    Wang, Lizhong
    2019 2ND IEEE INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND SIGNAL PROCESSING (ICICSP), 2019, : 276 - 280
  • [44] Do gender differences in audio-visual benefit and visual influence in audio-visual speech perception emerge with age?
    Alm, Magnus
    Behne, Dawn
    FRONTIERS IN PSYCHOLOGY, 2015, 6
  • [45] Somatosensory contribution to audio-visual speech processing
    Ito, Takayuki
    Ohashi, Hiroki
    Gracco, Vincent L.
    CORTEX, 2021, 143 : 195 - 204
  • [46] Some experiments in audio-visual speech processing
    Chollet, G.
    Landais, R.
    Hueber, T.
    Bredin, H.
    Mokbel, C.
    Perrot, P.
    Zouari, L.
    ADVANCES IN NONLINEAR SPEECH PROCESSING, 2007, 4885 : 28 - +
  • [47] Complementary models for audio-visual speech classification
    Sad, Gonzalo D.
    Terissi, Lucas D.
    Gomez, Juan C.
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2022, 25 (01) : 231 - 249
  • [48] Speaker independent audio-visual speech recognition
    Zhang, Y
    Levinson, S
    Huang, T
    2000 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, PROCEEDINGS VOLS I-III, 2000, : 1073 - 1076
  • [49] A coupled HMM for audio-visual speech recognition
    Nefian, AV
    Liang, LH
    Pi, XB
    Xiaoxiang, L
    Mao, C
    Murphy, K
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 2013 - 2016
  • [50] AUDIO-VISUAL SPEECH PERCEPTION - A PRELIMINARY REPORT
    EWERTSEN, HW
    NIELSEN, HB
    NIELSEN, SS
    ACTA OTO-LARYNGOLOGICA, 1970, : 229 - &