Speech acquisition in meetings with an audio-visual sensor array

被引:0
|
作者
McCowan, I [1 ]
Krishna, MH [1 ]
Gatica-Perez, D [1 ]
Moore, D [1 ]
Ba, S [1 ]
机构
[1] IDIAP Res Inst, CH-1920 Martigny, Switzerland
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Close-talk-headset microphones have been traditionally used for speech acquisition in a number of applications, as they naturally provide a higher signal-to-noise ratio-needed for recognition tasks-than single distant microphones. However, in multi-party conversational meetings, microphone arrays represent an important alternative to close-talking microphones, as they allow for localisation and tracking of speakers and signal-independent enhancement, while providing a non-intrusive, hands-free operation mode. In this article, we investigate the use of an audio-visual sensor array, composed of a small table-top microphone array and a set of cameras, for speaker tracking and speech enhancement in meetings. Our methodology first fuses audio and video for person tracking, and then integrates file output of the tracker with a beamformer for speech enhancement. We compare and discuss the features of the resulting speech signal with respect to that obtained from single close-talking and table-top microphones.
引用
收藏
页码:1383 / 1386
页数:4
相关论文
共 50 条
  • [31] Multi-Speaker Audio-Visual Corpus RUSAVIC: Russian Audio-Visual Speech in Cars
    Ivanko, Denis
    Ryumin, Dmitry
    Axyonov, Alexandr
    Kashevnik, Alexey
    Karpov, Alexey
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1555 - 1559
  • [32] Do gender differences in audio-visual benefit and visual influence in audio-visual speech perception emerge with age?
    Alm, Magnus
    Behne, Dawn
    FRONTIERS IN PSYCHOLOGY, 2015, 6
  • [33] Somatosensory contribution to audio-visual speech processing
    Ito, Takayuki
    Ohashi, Hiroki
    Gracco, Vincent L.
    CORTEX, 2021, 143 : 195 - 204
  • [34] Complementary models for audio-visual speech classification
    Sad, Gonzalo D.
    Terissi, Lucas D.
    Gomez, Juan C.
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2022, 25 (01) : 231 - 249
  • [35] A coupled HMM for audio-visual speech recognition
    Nefian, AV
    Liang, LH
    Pi, XB
    Xiaoxiang, L
    Mao, C
    Murphy, K
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 2013 - 2016
  • [36] Some experiments in audio-visual speech processing
    Chollet, G.
    Landais, R.
    Hueber, T.
    Bredin, H.
    Mokbel, C.
    Perrot, P.
    Zouari, L.
    ADVANCES IN NONLINEAR SPEECH PROCESSING, 2007, 4885 : 28 - +
  • [37] Speaker independent audio-visual speech recognition
    Zhang, Y
    Levinson, S
    Huang, T
    2000 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, PROCEEDINGS VOLS I-III, 2000, : 1073 - 1076
  • [38] Complementary models for audio-visual speech classification
    Gonzalo D. Sad
    Lucas D. Terissi
    Juan C. Gómez
    International Journal of Speech Technology, 2022, 25 : 231 - 249
  • [39] AUDIO-VISUAL SPEECH PERCEPTION - A PRELIMINARY REPORT
    EWERTSEN, HW
    NIELSEN, HB
    NIELSEN, SS
    ACTA OTO-LARYNGOLOGICA, 1970, : 229 - &
  • [40] Improved Lite Audio-Visual Speech Enhancement
    Chuang, Shang-Yi
    Wang, Hsin-Min
    Tsao, Yu
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 1345 - 1359