Speech acquisition in meetings with an audio-visual sensor array

被引：0

作者：

McCowan, I ^{[1
]}

Krishna, MH ^{[1
]}

Gatica-Perez, D ^{[1
]}

Moore, D ^{[1
]}

Ba, S ^{[1
]}

机构：

[1] IDIAP Res Inst, CH-1920 Martigny, Switzerland

来源：

2005 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), VOLS 1 AND 2 | 2005年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Close-talk-headset microphones have been traditionally used for speech acquisition in a number of applications, as they naturally provide a higher signal-to-noise ratio-needed for recognition tasks-than single distant microphones. However, in multi-party conversational meetings, microphone arrays represent an important alternative to close-talking microphones, as they allow for localisation and tracking of speakers and signal-independent enhancement, while providing a non-intrusive, hands-free operation mode. In this article, we investigate the use of an audio-visual sensor array, composed of a small table-top microphone array and a set of cameras, for speaker tracking and speech enhancement in meetings. Our methodology first fuses audio and video for person tracking, and then integrates file output of the tracker with a beamformer for speech enhancement. We compare and discuss the features of the resulting speech signal with respect to that obtained from single close-talking and table-top microphones.

引用

页码：1383 / 1386

页数：4

共 50 条

[31] Multi-Speaker Audio-Visual Corpus RUSAVIC: Russian Audio-Visual Speech in Cars
Ivanko, Denis
Ryumin, Dmitry
Axyonov, Alexandr
Kashevnik, Alexey
Karpov, Alexey
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1555 - 1559
[32] Do gender differences in audio-visual benefit and visual influence in audio-visual speech perception emerge with age?
Alm, Magnus
Behne, Dawn
FRONTIERS IN PSYCHOLOGY, 2015, 6
[33] Somatosensory contribution to audio-visual speech processing
Ito, Takayuki
Ohashi, Hiroki
Gracco, Vincent L.
CORTEX, 2021, 143 : 195 - 204
[34] Complementary models for audio-visual speech classification
Sad, Gonzalo D.
Terissi, Lucas D.
Gomez, Juan C.
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2022, 25 (01) : 231 - 249
[35] A coupled HMM for audio-visual speech recognition
Nefian, AV
Liang, LH
Pi, XB
Xiaoxiang, L
Mao, C
Murphy, K
2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 2013 - 2016
[36] Some experiments in audio-visual speech processing
Chollet, G.
Landais, R.
Hueber, T.
Bredin, H.
Mokbel, C.
Perrot, P.
Zouari, L.
ADVANCES IN NONLINEAR SPEECH PROCESSING, 2007, 4885 : 28 - +
[37] Speaker independent audio-visual speech recognition
Zhang, Y
Levinson, S
Huang, T
2000 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, PROCEEDINGS VOLS I-III, 2000, : 1073 - 1076
[38] Complementary models for audio-visual speech classification
Gonzalo D. Sad
Lucas D. Terissi
Juan C. Gómez
International Journal of Speech Technology, 2022, 25 : 231 - 249
[39] AUDIO-VISUAL SPEECH PERCEPTION - A PRELIMINARY REPORT
EWERTSEN, HW
NIELSEN, HB
NIELSEN, SS
ACTA OTO-LARYNGOLOGICA, 1970, : 229 - &
[40] Improved Lite Audio-Visual Speech Enhancement
Chuang, Shang-Yi
Wang, Hsin-Min
Tsao, Yu
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 1345 - 1359

← 1 2 3 4 5 →