Using multiple acoustic feature sets for speech recognition

被引:20
|
作者
Zolnay, Andras [1 ]
Kocharov, Daniil
Schlueter, Ralf
Ney, Hermann
机构
[1] Univ Aachen, Rhein Westfal TH Aachen, Lehrsuthl Informat 6, Dept Comp Sci, D-52056 Aachen, Germany
[2] St Petersburg State Univ, Dept Phonet, St Petersburg 199034, Russia
关键词
acoustic feature extraction; auditory features; articulatory features; voicing; spectrum derivative feature; linear discriminant analysis; discriminative model combination;
D O I
10.1016/j.specom.2007.04.005
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, the use of multiple acoustic feature sets for speech recognition is investigated. The combination of both auditory as well as articulatory motivated features is considered. In addition to a voicing feature, we introduce a recently developed articulatory motivated feature, the spectrum derivative feature. Features are combined both directly using linear discriminant analysis (LDA) as well as indirectly on model level using discriminative model combination (DMC). Experimental results are presented for both small- and large-vocabulary tasks. The results show that the accuracy of automatic speech recognition systems can be significantly improved by the combination of auditory and articulatory motivated features. The word error rate is reduced from 1.8% to 1.5% on the SieTill task for German digit string recognition. Consistent improvements in word error rate have been obtained on two large-vocabulary corpora. The word error rate is reduced from 19.1% to 18.4% on the VerbMobil II corpus, a German large-vocabulary conversational speech task, and from 14.1% to 13.5% on the British English part of the European parliament plenary sessions (EPPS) task from the 2005 TC-STAR ASR evaluation campaign. (C) 2007 Elsevier B.V. All rights reserved.
引用
收藏
页码:514 / 525
页数:12
相关论文
共 50 条
  • [41] Automatic speech recognition using acoustic doppler signal
    Lee, Ki-Seung
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2016, 35 (01): : 74 - 82
  • [42] Disordered Speech Recognition Using Acoustic and sEMG Signals
    Deng, Yunbin
    Patel, Rupal
    Heaton, James T.
    Colby, Glen
    Gilmore, L. Donald
    Cabrera, Joao
    Roy, Serge H.
    De Luca, Carlo J.
    Meltzner, Geoffrey S.
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 632 - +
  • [43] Acoustic feature analysis and discriminative modeling of filled pauses for spontaneous speech recognition
    Wu, CH
    Yan, GL
    JOURNAL OF VLSI SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2004, 36 (2-3): : 91 - 104
  • [44] Acoustic Feature Analysis and Discriminative Modeling of Filled Pauses for Spontaneous Speech Recognition
    Chung-Hsien Wu
    Gwo-Lang Yan
    Journal of VLSI signal processing systems for signal, image and video technology, 2004, 36 : 91 - 104
  • [45] Examining Vocal Tract Coordination in Childhood Apraxia of Speech with Acoustic-to-Articulatory Speech Inversion Feature Sets
    Benway, Nina R.
    Preston, Jonathan L.
    Espy-Wilson, Carol
    INTERSPEECH 2024, 2024, : 5138 - 5142
  • [46] Second Language Speech Recognition using Multiple-Pass Decoding with Lexicon Represented by Multiple Reduced Phoneme Sets
    Wang, Xianyun
    Yamamoto, Seiichi
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1265 - 1269
  • [47] A Comparison of Machine Learning Algorithms and Feature Sets for Automatic Vocal Emotion Recognition in Speech
    Dogdu, Cem
    Kessler, Thomas
    Schneider, Dana
    Shadaydeh, Maha
    Schweinberger, Stefan R.
    SENSORS, 2022, 22 (19)
  • [48] Speech Emotion Recognition Using Multiple Classifiers
    Wang, Kunxia
    Chu, Zongcheng
    Wang, Kai
    Yu, Tongqing
    Liu, Li
    WEB AND BIG DATA, 2017, 10612 : 84 - 93
  • [49] Combining Multiple Acoustic Models in GMM Spaces for Robust Speech Recognition
    Kang, Byung Ok
    Kwon, Oh-Wook
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2016, E99D (03): : 724 - 730
  • [50] A Study of Bootstrapping with Multiple Acoustic Features for Improved Automatic Speech Recognition
    Cui, Xiaodong
    Xue, Jian
    Xiang, Bing
    Zhou, Bowen
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 248 - 251