Affine-invariant visual features contain supplementary information to enhance speech recognition

被引:0
|
作者
Gurbuz, S [1 ]
Patterson, E [1 ]
Tufekci, Z [1 ]
Gowdy, JN [1 ]
机构
[1] Clemson Univ, Dept Elect & Comp Engn, Clemson, SC 29634 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The performance of audio-based speech recognition systems degrades severely when there is a mismatch between training and usage environments due to background noise. This degradation is due to a loss of ability to extract and distinguish important information from audio features. One of the emerging techniques for dealing with this problem is the addition of visual features in a multimodal recognition system. This paper presents an affine-invariant, multimodal speech recognition system and focuses on the supplementary information that is available from video features.
引用
收藏
页码:175 / 181
页数:7
相关论文
共 50 条
  • [31] Visual speech information for face recognition
    Rosenblum, LD
    Yakel, DA
    Baseer, N
    Panchal, A
    Nodarse, BC
    Niehus, RP
    PERCEPTION & PSYCHOPHYSICS, 2002, 64 (02): : 220 - 229
  • [32] ADVERSARIAL LEARNING OF RAW SPEECH FEATURES FOR DOMAIN INVARIANT SPEECH RECOGNITION
    Tripathi, Aditay
    Mohan, Aanchan
    Anand, Saket
    Singh, Maneesh
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5959 - 5963
  • [33] Speaker-Invariant Features for Automatic Speech Recognition
    Umesh, S.
    Sanand, D. R.
    Praveen, G.
    20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 1738 - 1743
  • [34] DESIGNING RELEVANT FEATURES FOR VISUAL SPEECH RECOGNITION
    Benhaim, Eric
    Sahbi, Hichem
    Vitte, Guillaume
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 2420 - 2424
  • [35] An evaluation of visual speech features for the tasks of speech and speaker recognition
    Lucey, S
    AUDIO-BASED AND VIDEO-BASED BIOMETRIC PERSON AUTHENTICATION, PROCEEDINGS, 2003, 2688 : 260 - 267
  • [36] Inclusion of temporal information into features for speech recognition
    Milner, B
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 256 - 259
  • [37] Vocal tract length invariant features for automatic speech recognition
    Mertins, A
    Rademacher, J
    2005 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2005, : 308 - 312
  • [38] Frequency-warping invariant features for automatic speech recognition
    Mertins, Alfred
    Rademacher, Jan
    2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 5883 - 5886
  • [39] Improved Warping-Invariant Features for Automatic Speech Recognition
    Rademacher, Jan
    Waechter, Matthias
    Mertins, Alfred
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1499 - 1502
  • [40] 3D object modeling and recognition using affine-invariant patches and multi-view spatial constraints
    Rothganger, F
    Lazebnik, S
    Schmid, C
    Ponce, J
    2003 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL II, PROCEEDINGS, 2003, : 272 - 277