Affine-invariant visual features contain supplementary information to enhance speech recognition

被引：0

作者：

Gurbuz, S ^{[1
]}

Patterson, E ^{[1
]}

Tufekci, Z ^{[1
]}

Gowdy, JN ^{[1
]}

机构：

[1] Clemson Univ, Dept Elect & Comp Engn, Clemson, SC 29634 USA

来源：

AUDIO- AND VIDEO-BASED BIOMETRIC PERSON AUTHENTICATION, PROCEEDINGS | 2001年 / 2091卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The performance of audio-based speech recognition systems degrades severely when there is a mismatch between training and usage environments due to background noise. This degradation is due to a loss of ability to extract and distinguish important information from audio features. One of the emerging techniques for dealing with this problem is the addition of visual features in a multimodal recognition system. This paper presents an affine-invariant, multimodal speech recognition system and focuses on the supplementary information that is available from video features.

引用

页码：175 / 181

页数：7

共 50 条

[31] Visual speech information for face recognition
Rosenblum, LD
Yakel, DA
Baseer, N
Panchal, A
Nodarse, BC
Niehus, RP
PERCEPTION & PSYCHOPHYSICS, 2002, 64 (02): : 220 - 229
[32] ADVERSARIAL LEARNING OF RAW SPEECH FEATURES FOR DOMAIN INVARIANT SPEECH RECOGNITION
Tripathi, Aditay
Mohan, Aanchan
Anand, Saket
Singh, Maneesh
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5959 - 5963
[33] Speaker-Invariant Features for Automatic Speech Recognition
Umesh, S.
Sanand, D. R.
Praveen, G.
20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 1738 - 1743
[34] DESIGNING RELEVANT FEATURES FOR VISUAL SPEECH RECOGNITION
Benhaim, Eric
Sahbi, Hichem
Vitte, Guillaume
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 2420 - 2424
[35] An evaluation of visual speech features for the tasks of speech and speaker recognition
Lucey, S
AUDIO-BASED AND VIDEO-BASED BIOMETRIC PERSON AUTHENTICATION, PROCEEDINGS, 2003, 2688 : 260 - 267
[36] Inclusion of temporal information into features for speech recognition
Milner, B
ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 256 - 259
[37] Vocal tract length invariant features for automatic speech recognition
Mertins, A
Rademacher, J
2005 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2005, : 308 - 312
[38] Frequency-warping invariant features for automatic speech recognition
Mertins, Alfred
Rademacher, Jan
2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 5883 - 5886
[39] Improved Warping-Invariant Features for Automatic Speech Recognition
Rademacher, Jan
Waechter, Matthias
Mertins, Alfred
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1499 - 1502
[40] 3D object modeling and recognition using affine-invariant patches and multi-view spatial constraints
Rothganger, F
Lazebnik, S
Schmid, C
Ponce, J
2003 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL II, PROCEEDINGS, 2003, : 272 - 277

← 1 2 3 4 5 →