An audio-visual approach to simultaneous-speaker speech recognition

被引：0

作者：

Patterson, EK ^{[1
]}

Gowdy, JN ^{[1
]}

机构：

[1] Univ N Carolina, Dept Comp Sci, Wilmington, NC 28403 USA

来源：

2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS: SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO AND ELECTROACOUSTICS MULTIMEDIA SIGNAL PROCESSING | 2003年

关键词：

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Audio-visual speech recognition is an area with great potential to help solve challenging problems in speech processing. Difficulties due to background noises are significantly reduced by the additional information provided by extra visual features. The presence of additional speech from other talkers during recording may be viewed as one of the most difficult sources of noise. This paper presents a study using audio-visual speech recognition for simultaneous-speaker speech recognition. The desired goal is to separate and potentially recognize speech from several simultaneous speakers. Speaker pairs from the CUAVE multimodal speech corpus are used in this work. Audio-visual techniques are compared against speaker-independent and speaker-dependent audio-only methods for speech recognition of individuals from these pairs. For information on obtaining CUAVE, please visit the following web page (http://ece.clemson.edu/speech).

引用

页码：780 / 783

页数：4

共 50 条

[31] Separation of audio-visual speech sources: A new approach exploiting the audio-visual coherence of speech stimuli
Sodoyer, D. (sodoyer@icp.inpg.fr), 1600, Hindawi Publishing Corporation (2002):
[32] Separation of Audio-Visual Speech Sources: A New Approach Exploiting the Audio-Visual Coherence of Speech Stimuli
David Sodoyer
Jean-Luc Schwartz
Laurent Girin
Jacob Klinkisch
Christian Jutten
EURASIP Journal on Advances in Signal Processing, 2002
[33] Speaker-Targeted Audio-Visual Models for Speech Recognition in Cocktail-Party Environments
Chao, Guan-Lin
Chan, William
Lane, Ian
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2120 - 2124
[34] A CLOSER LOOK AT AUDIO-VISUAL MULTI-PERSON SPEECH RECOGNITION AND ACTIVE SPEAKER SELECTION
Braga, Otavio
Siohan, Olivier
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6863 - 6867
[35] Integration strategies for audio-visual speech processing: Applied to text-dependent speaker recognition
Lucey, S
Chen, TH
Sridharan, S
Chandran, V
IEEE TRANSACTIONS ON MULTIMEDIA, 2005, 7 (03) : 495 - 506
[36] Integration of audio-visual information for multi-speaker multimedia speaker recognition
Yang, Jichen
Chen, Fangfan
Cheng, Yu
Lin, Pei
DIGITAL SIGNAL PROCESSING, 2024, 145
[37] Audio-visual fuzzy fusion for robust speech recognition
Malcangi, M.
Ouazzane, K.
Patel, P.
2013 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2013,
[38] Audio-visual speech recognition using lstm and cnn
El Maghraby E.E.
Gody A.M.
Farouk M.H.
Recent Advances in Computer Science and Communications, 2021, 14 (06) : 2023 - 2039
[39] Building a data corpus for audio-visual speech recognition
Chitu, Alin G.
Rothkrantz, Leon J. M.
EUROMEDIA '2007, 2007, : 88 - 92
[40] Audio-Visual Automatic Speech Recognition for Connected Digits
Wang, Xiaoping
Hao, Yufeng
Fu, Degang
Yuan, Chunwei
2008 INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY APPLICATION, VOL III, PROCEEDINGS, 2008, : 328 - +

← 1 2 3 4 5 →