Using multiple acoustic feature sets for speech recognition

被引:20
|
作者
Zolnay, Andras [1 ]
Kocharov, Daniil
Schlueter, Ralf
Ney, Hermann
机构
[1] Univ Aachen, Rhein Westfal TH Aachen, Lehrsuthl Informat 6, Dept Comp Sci, D-52056 Aachen, Germany
[2] St Petersburg State Univ, Dept Phonet, St Petersburg 199034, Russia
关键词
acoustic feature extraction; auditory features; articulatory features; voicing; spectrum derivative feature; linear discriminant analysis; discriminative model combination;
D O I
10.1016/j.specom.2007.04.005
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, the use of multiple acoustic feature sets for speech recognition is investigated. The combination of both auditory as well as articulatory motivated features is considered. In addition to a voicing feature, we introduce a recently developed articulatory motivated feature, the spectrum derivative feature. Features are combined both directly using linear discriminant analysis (LDA) as well as indirectly on model level using discriminative model combination (DMC). Experimental results are presented for both small- and large-vocabulary tasks. The results show that the accuracy of automatic speech recognition systems can be significantly improved by the combination of auditory and articulatory motivated features. The word error rate is reduced from 1.8% to 1.5% on the SieTill task for German digit string recognition. Consistent improvements in word error rate have been obtained on two large-vocabulary corpora. The word error rate is reduced from 19.1% to 18.4% on the VerbMobil II corpus, a German large-vocabulary conversational speech task, and from 14.1% to 13.5% on the British English part of the European parliament plenary sessions (EPPS) task from the 2005 TC-STAR ASR evaluation campaign. (C) 2007 Elsevier B.V. All rights reserved.
引用
收藏
页码:514 / 525
页数:12
相关论文
共 50 条
  • [31] Power exponential densities for the training and classification of acoustic feature vectors in speech recognition
    Basu, S
    Micchelli, CA
    Olsen, P
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2001, 10 (01) : 158 - 184
  • [32] Multi-setting acoustic feature training for data augmentation of speech recognition
    Ueno, Sei
    Lee, Akinobu
    ACOUSTICAL SCIENCE AND TECHNOLOGY, 2024, 45 (04) : 195 - 203
  • [33] Acoustic Feature Optimization Based on F-Ratio for Robust Speech Recognition
    Sun, Yanqing
    Zhou, Yu
    Zhao, Qingwei
    Yan, Yonghong
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2010, E93D (09): : 2417 - 2430
  • [34] Anger recognition in speech using acoustic and linguistic cues
    Polzehl, Tim
    Schmitt, Alexander
    Metze, Florian
    Wagner, Michael
    SPEECH COMMUNICATION, 2011, 53 (9-10) : 1198 - 1209
  • [35] Speech recognition using automatically derived acoustic baseforms
    Rose, RC
    Lleida, E
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1271 - 1274
  • [36] Deep Learning-Based Acoustic Feature Representations for Dysarthric Speech Recognition
    Latha M.
    Shivakumar M.
    Manjula G.
    Hemakumar M.
    Kumar M.K.
    SN Computer Science, 4 (3)
  • [37] Robust speech recognition by using compensated acoustic scores
    Sato, S
    Onoe, K
    Kobayashi, A
    Imai, T
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2006, E89D (03): : 915 - 921
  • [38] Using Syllables as Acoustic Units for Spontaneous Speech Recognition
    Hejtmanek, Jan
    TEXT, SPEECH AND DIALOGUE, 2010, 6231 : 299 - 305
  • [39] Feature generation based on maximum normalized acoustic likelihood for improved speech recognition
    Li, X
    Stern, RM
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 545 - 548
  • [40] Conversational speech recognition using acoustic and articulatory input
    Kirchhoff, K
    Fink, GA
    Sagerer, G
    2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1435 - 1438