Affine-invariant visual features contain supplementary information to enhance speech recognition

被引:0
|
作者
Gurbuz, S [1 ]
Patterson, E [1 ]
Tufekci, Z [1 ]
Gowdy, JN [1 ]
机构
[1] Clemson Univ, Dept Elect & Comp Engn, Clemson, SC 29634 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The performance of audio-based speech recognition systems degrades severely when there is a mismatch between training and usage environments due to background noise. This degradation is due to a loss of ability to extract and distinguish important information from audio features. One of the emerging techniques for dealing with this problem is the addition of visual features in a multimodal recognition system. This paper presents an affine-invariant, multimodal speech recognition system and focuses on the supplementary information that is available from video features.
引用
收藏
页码:175 / 181
页数:7
相关论文
共 50 条
  • [41] Dynamic visual features based on discriminative speech class projection for visual speech recognition
    Lei, X
    Cai, XL
    Fu, ZH
    Zhao, RC
    PROCEEDINGS OF THE 2004 INTERNATIONAL SYMPOSIUM ON INTELLIGENT MULTIMEDIA, VIDEO AND SPEECH PROCESSING, 2004, : 687 - 690
  • [42] Traffic signs recognition based on affine invariant Hu's moment features
    Liu, Min
    Mao, Jianxu
    MECHATRONICS AND INDUSTRIAL INFORMATICS, PTS 1-4, 2013, 321-324 : 945 - 949
  • [43] DEEP COMPLEMENTARY BOTTLENECK FEATURES FOR VISUAL SPEECH RECOGNITION
    Petridis, Stavros
    Pantic, Maja
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 2304 - 2308
  • [44] Extraction of affine invariant features for shape recognition based on ant colony optimization
    Mao, Yuxing
    Suen, Ching Y.
    He, Wei
    JOURNAL OF SYSTEMS ENGINEERING AND ELECTRONICS, 2011, 22 (06) : 1003 - 1009
  • [45] EXTRACTING DEEP BOTTLENECK FEATURES FOR VISUAL SPEECH RECOGNITION
    Sui, Chao
    Togneri, Roberto
    Bennamoun, Mohammed
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 1518 - 1522
  • [46] VISUAL FEATURES FOR CONTEXT-AWARE SPEECH RECOGNITION
    Gupta, Abhinav
    Miao, Yajie
    Neves, Leonardo
    Metze, Florian
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5020 - 5024
  • [47] Extraction of affine invariant features for shape recognition based on ant colony optimization
    Yuxing Mao 1
    2.Centre for Pattern Recognition and Machine Intelligence
    Journal of Systems Engineering and Electronics, 2011, 22 (06) : 1003 - 1009
  • [48] Affine invariant information embedment for accurate camera-based character recognition
    Omachi, Shinichiro
    Uchida, Seiichi
    Iwamura, Masakazu
    Kise, Koichi
    18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 2, PROCEEDINGS, 2006, : 1098 - +
  • [49] The effect of prior visual information on recognition of speech and sounds
    Noppeney, Uta
    Josephs, Oliver
    Hocking, Julia
    Price, Cathy J.
    Friston, Karl J.
    CEREBRAL CORTEX, 2008, 18 (03) : 598 - 609
  • [50] Rate-Invariant Comparisons of Covariance Paths for Visual Speech Recognition
    Su, Jingyong
    Srivastava, Anuj
    Souza, Fillipe
    Sarkar, Sudeep
    2013 FOURTH NATIONAL CONFERENCE ON COMPUTER VISION, PATTERN RECOGNITION, IMAGE PROCESSING AND GRAPHICS (NCVPRIPG), 2013,