Speech acquisition in meetings with an audio-visual sensor array

被引：0

作者：

McCowan, I ^{[1
]}

Krishna, MH ^{[1
]}

Gatica-Perez, D ^{[1
]}

Moore, D ^{[1
]}

Ba, S ^{[1
]}

机构：

[1] IDIAP Res Inst, CH-1920 Martigny, Switzerland

来源：

2005 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), VOLS 1 AND 2 | 2005年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Close-talk-headset microphones have been traditionally used for speech acquisition in a number of applications, as they naturally provide a higher signal-to-noise ratio-needed for recognition tasks-than single distant microphones. However, in multi-party conversational meetings, microphone arrays represent an important alternative to close-talking microphones, as they allow for localisation and tracking of speakers and signal-independent enhancement, while providing a non-intrusive, hands-free operation mode. In this article, we investigate the use of an audio-visual sensor array, composed of a small table-top microphone array and a set of cameras, for speaker tracking and speech enhancement in meetings. Our methodology first fuses audio and video for person tracking, and then integrates file output of the tracker with a beamformer for speech enhancement. We compare and discuss the features of the resulting speech signal with respect to that obtained from single close-talking and table-top microphones.

引用

页码：1383 / 1386

页数：4

共 50 条

[41] AUDIO-VISUAL SPEECH PROCESSING IN OLDER ADULTS
Burke, K. E.
Maguinness, C. T.
Setti, A.
Kenny, R. A.
Newell, F. N.
IRISH JOURNAL OF MEDICAL SCIENCE, 2010, 179 : S124 - S124
[42] Audio-visual speech in noise perception in dyslexia
van Laarhoven, Thijs
Keetels, Mirjam
Schakel, Lemmy
Vroomen, Jean
DEVELOPMENTAL SCIENCE, 2018, 21 (01)
[43] Audio-Visual Deep Clustering for Speech Separation
Lu, Rui
Duan, Zhiyao
Zhang, Changshui
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (11) : 1697 - 1712
[44] The coordination of production and perception in audio-visual speech
Vatikiotis-Bateson, E
Munhall, KG
INTERNATIONAL JOURNAL OF PSYCHOLOGY, 2000, 35 (3-4) : 281 - 281
[45] Boosted audio-visual HMM for speech reading
Yin, P
Essa, I
Rehg, JM
IEEE INTERNATIONAL WORKSHOP ON ANALYSIS AND MODELING OF FACE AND GESTURES, 2003, : 68 - 73
[46] AUDIO-VISUAL SPEECH INPAINTING WITH DEEP LEARNING
Morrone, Giovanni
Michelsanti, Daniel
Tan, Zheng-Hua
Jensen, Jesper
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6653 - 6657
[47] Audio-visual graphical models for speech processing
Hershey, J
Attias, H
Jojic, N
Kristjansson, T
2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS: DESIGN AND IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS INDUSTRY TECHNOLOGY TRACKS MACHINE LEARNING FOR SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING SIGNAL PROCESSING FOR EDUCATION, 2004, : 649 - 652
[48] An asynchronous DBN for audio-visual speech recognition
Saenko, Kate
Livescu, Karen
2006 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, 2006, : 154 - +
[49] Boosted audio-visual HMM for speech reading
Yin, P
Essa, I
Rehg, JM
CONFERENCE RECORD OF THE THIRTY-SEVENTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, VOLS 1 AND 2, 2003, : 2013 - 2018
[50] Improved Lite Audio-Visual Speech Enhancement
Chuang, Shang-Yi
Wang, Hsin-Min
Tsao, Yu
IEEE/ACM Transactions on Audio Speech and Language Processing, 2022, 30 : 1345 - 1359

← 1 2 3 4 5 →