An audio-visual approach to simultaneous-speaker speech recognition

被引:0
|
作者
Patterson, EK [1 ]
Gowdy, JN [1 ]
机构
[1] Univ N Carolina, Dept Comp Sci, Wilmington, NC 28403 USA
关键词
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Audio-visual speech recognition is an area with great potential to help solve challenging problems in speech processing. Difficulties due to background noises are significantly reduced by the additional information provided by extra visual features. The presence of additional speech from other talkers during recording may be viewed as one of the most difficult sources of noise. This paper presents a study using audio-visual speech recognition for simultaneous-speaker speech recognition. The desired goal is to separate and potentially recognize speech from several simultaneous speakers. Speaker pairs from the CUAVE multimodal speech corpus are used in this work. Audio-visual techniques are compared against speaker-independent and speaker-dependent audio-only methods for speech recognition of individuals from these pairs. For information on obtaining CUAVE, please visit the following web page (http://ece.clemson.edu/speech).
引用
收藏
页码:780 / 783
页数:4
相关论文
共 50 条
  • [11] Audio-visual integration for speech recognition
    Kober, R
    Harz, U
    NEUROLOGY PSYCHIATRY AND BRAIN RESEARCH, 1996, 4 (04) : 179 - 184
  • [12] MULTIPOSE AUDIO-VISUAL SPEECH RECOGNITION
    Estellers, Virginia
    Thiran, Jean-Philippe
    19TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO-2011), 2011, : 1065 - 1069
  • [13] Audio-visual speech recognition by speechreading
    Zhang, XZ
    Mersereau, RM
    Clements, MA
    DSP 2002: 14TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING PROCEEDINGS, VOLS 1 AND 2, 2002, : 1069 - 1072
  • [14] Multi-Speaker Audio-Visual Corpus RUSAVIC: Russian Audio-Visual Speech in Cars
    Ivanko, Denis
    Ryumin, Dmitry
    Axyonov, Alexandr
    Kashevnik, Alexey
    Karpov, Alexey
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1555 - 1559
  • [15] An audio-visual speech recognition system for testing new audio-visual databases
    Pao, Tsang-Long
    Liao, Wen-Yuan
    VISAPP 2006: PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON COMPUTER VISION THEORY AND APPLICATIONS, VOL 2, 2006, : 192 - +
  • [16] LEARNING CONTEXTUALLY FUSED AUDIO-VISUAL REPRESENTATIONS FOR AUDIO-VISUAL SPEECH RECOGNITION
    Zhang, Zi-Qiang
    Zhang, Jie
    Zhang, Jian-Shu
    Wu, Ming-Hui
    Fang, Xin
    Dai, Li-Rong
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1346 - 1350
  • [17] Audio-Visual Speech Recognition in Noisy Audio Environments
    Palecek, Karel
    Chaloupka, Josef
    2013 36TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2013, : 484 - 487
  • [18] A Bayesian approach to audio-visual speaker identification
    Nefian, AV
    Liang, LH
    Fu, TY
    Liu, XX
    AUDIO-BASED AND VIDEO-BASED BIOMETRIC PERSON AUTHENTICATION, PROCEEDINGS, 2003, 2688 : 761 - 769
  • [19] Audio-Visual Speech Modeling for Continuous Speech Recognition
    Dupont, Stephane
    Luettin, Juergen
    IEEE TRANSACTIONS ON MULTIMEDIA, 2000, 2 (03) : 141 - 151
  • [20] Speaker and digit recognition by audio-visual lip biometrics
    Faraj, Maycel Isaac
    Bigun, Josef
    ADVANCES IN BIOMETRICS, PROCEEDINGS, 2007, 4642 : 1016 - +