Speech acquisition in meetings with an audio-visual sensor array

被引:0
|
作者
McCowan, I [1 ]
Krishna, MH [1 ]
Gatica-Perez, D [1 ]
Moore, D [1 ]
Ba, S [1 ]
机构
[1] IDIAP Res Inst, CH-1920 Martigny, Switzerland
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Close-talk-headset microphones have been traditionally used for speech acquisition in a number of applications, as they naturally provide a higher signal-to-noise ratio-needed for recognition tasks-than single distant microphones. However, in multi-party conversational meetings, microphone arrays represent an important alternative to close-talking microphones, as they allow for localisation and tracking of speakers and signal-independent enhancement, while providing a non-intrusive, hands-free operation mode. In this article, we investigate the use of an audio-visual sensor array, composed of a small table-top microphone array and a set of cameras, for speaker tracking and speech enhancement in meetings. Our methodology first fuses audio and video for person tracking, and then integrates file output of the tracker with a beamformer for speech enhancement. We compare and discuss the features of the resulting speech signal with respect to that obtained from single close-talking and table-top microphones.
引用
收藏
页码:1383 / 1386
页数:4
相关论文
共 50 条
  • [41] AUDIO-VISUAL SPEECH PROCESSING IN OLDER ADULTS
    Burke, K. E.
    Maguinness, C. T.
    Setti, A.
    Kenny, R. A.
    Newell, F. N.
    IRISH JOURNAL OF MEDICAL SCIENCE, 2010, 179 : S124 - S124
  • [42] Audio-visual speech in noise perception in dyslexia
    van Laarhoven, Thijs
    Keetels, Mirjam
    Schakel, Lemmy
    Vroomen, Jean
    DEVELOPMENTAL SCIENCE, 2018, 21 (01)
  • [43] Audio-Visual Deep Clustering for Speech Separation
    Lu, Rui
    Duan, Zhiyao
    Zhang, Changshui
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (11) : 1697 - 1712
  • [44] The coordination of production and perception in audio-visual speech
    Vatikiotis-Bateson, E
    Munhall, KG
    INTERNATIONAL JOURNAL OF PSYCHOLOGY, 2000, 35 (3-4) : 281 - 281
  • [45] Boosted audio-visual HMM for speech reading
    Yin, P
    Essa, I
    Rehg, JM
    IEEE INTERNATIONAL WORKSHOP ON ANALYSIS AND MODELING OF FACE AND GESTURES, 2003, : 68 - 73
  • [46] AUDIO-VISUAL SPEECH INPAINTING WITH DEEP LEARNING
    Morrone, Giovanni
    Michelsanti, Daniel
    Tan, Zheng-Hua
    Jensen, Jesper
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6653 - 6657
  • [47] Audio-visual graphical models for speech processing
    Hershey, J
    Attias, H
    Jojic, N
    Kristjansson, T
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS: DESIGN AND IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS INDUSTRY TECHNOLOGY TRACKS MACHINE LEARNING FOR SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING SIGNAL PROCESSING FOR EDUCATION, 2004, : 649 - 652
  • [48] An asynchronous DBN for audio-visual speech recognition
    Saenko, Kate
    Livescu, Karen
    2006 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, 2006, : 154 - +
  • [49] Boosted audio-visual HMM for speech reading
    Yin, P
    Essa, I
    Rehg, JM
    CONFERENCE RECORD OF THE THIRTY-SEVENTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, VOLS 1 AND 2, 2003, : 2013 - 2018
  • [50] Improved Lite Audio-Visual Speech Enhancement
    Chuang, Shang-Yi
    Wang, Hsin-Min
    Tsao, Yu
    IEEE/ACM Transactions on Audio Speech and Language Processing, 2022, 30 : 1345 - 1359