An audio-visual approach to simultaneous-speaker speech recognition

被引:0
|
作者
Patterson, EK [1 ]
Gowdy, JN [1 ]
机构
[1] Univ N Carolina, Dept Comp Sci, Wilmington, NC 28403 USA
关键词
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Audio-visual speech recognition is an area with great potential to help solve challenging problems in speech processing. Difficulties due to background noises are significantly reduced by the additional information provided by extra visual features. The presence of additional speech from other talkers during recording may be viewed as one of the most difficult sources of noise. This paper presents a study using audio-visual speech recognition for simultaneous-speaker speech recognition. The desired goal is to separate and potentially recognize speech from several simultaneous speakers. Speaker pairs from the CUAVE multimodal speech corpus are used in this work. Audio-visual techniques are compared against speaker-independent and speaker-dependent audio-only methods for speech recognition of individuals from these pairs. For information on obtaining CUAVE, please visit the following web page (http://ece.clemson.edu/speech).
引用
收藏
页码:780 / 783
页数:4
相关论文
共 50 条
  • [31] Separation of audio-visual speech sources: A new approach exploiting the audio-visual coherence of speech stimuli
    Sodoyer, D. (sodoyer@icp.inpg.fr), 1600, Hindawi Publishing Corporation (2002):
  • [32] Separation of Audio-Visual Speech Sources: A New Approach Exploiting the Audio-Visual Coherence of Speech Stimuli
    David Sodoyer
    Jean-Luc Schwartz
    Laurent Girin
    Jacob Klinkisch
    Christian Jutten
    EURASIP Journal on Advances in Signal Processing, 2002
  • [33] Speaker-Targeted Audio-Visual Models for Speech Recognition in Cocktail-Party Environments
    Chao, Guan-Lin
    Chan, William
    Lane, Ian
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2120 - 2124
  • [34] A CLOSER LOOK AT AUDIO-VISUAL MULTI-PERSON SPEECH RECOGNITION AND ACTIVE SPEAKER SELECTION
    Braga, Otavio
    Siohan, Olivier
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6863 - 6867
  • [35] Integration strategies for audio-visual speech processing: Applied to text-dependent speaker recognition
    Lucey, S
    Chen, TH
    Sridharan, S
    Chandran, V
    IEEE TRANSACTIONS ON MULTIMEDIA, 2005, 7 (03) : 495 - 506
  • [36] Integration of audio-visual information for multi-speaker multimedia speaker recognition
    Yang, Jichen
    Chen, Fangfan
    Cheng, Yu
    Lin, Pei
    DIGITAL SIGNAL PROCESSING, 2024, 145
  • [37] Audio-visual fuzzy fusion for robust speech recognition
    Malcangi, M.
    Ouazzane, K.
    Patel, P.
    2013 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2013,
  • [38] Audio-visual speech recognition using lstm and cnn
    El Maghraby E.E.
    Gody A.M.
    Farouk M.H.
    Recent Advances in Computer Science and Communications, 2021, 14 (06) : 2023 - 2039
  • [39] Building a data corpus for audio-visual speech recognition
    Chitu, Alin G.
    Rothkrantz, Leon J. M.
    EUROMEDIA '2007, 2007, : 88 - 92
  • [40] Audio-Visual Automatic Speech Recognition for Connected Digits
    Wang, Xiaoping
    Hao, Yufeng
    Fu, Degang
    Yuan, Chunwei
    2008 INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY APPLICATION, VOL III, PROCEEDINGS, 2008, : 328 - +