Using multiple acoustic feature sets for speech recognition

被引:20
|
作者
Zolnay, Andras [1 ]
Kocharov, Daniil
Schlueter, Ralf
Ney, Hermann
机构
[1] Univ Aachen, Rhein Westfal TH Aachen, Lehrsuthl Informat 6, Dept Comp Sci, D-52056 Aachen, Germany
[2] St Petersburg State Univ, Dept Phonet, St Petersburg 199034, Russia
关键词
acoustic feature extraction; auditory features; articulatory features; voicing; spectrum derivative feature; linear discriminant analysis; discriminative model combination;
D O I
10.1016/j.specom.2007.04.005
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, the use of multiple acoustic feature sets for speech recognition is investigated. The combination of both auditory as well as articulatory motivated features is considered. In addition to a voicing feature, we introduce a recently developed articulatory motivated feature, the spectrum derivative feature. Features are combined both directly using linear discriminant analysis (LDA) as well as indirectly on model level using discriminative model combination (DMC). Experimental results are presented for both small- and large-vocabulary tasks. The results show that the accuracy of automatic speech recognition systems can be significantly improved by the combination of auditory and articulatory motivated features. The word error rate is reduced from 1.8% to 1.5% on the SieTill task for German digit string recognition. Consistent improvements in word error rate have been obtained on two large-vocabulary corpora. The word error rate is reduced from 19.1% to 18.4% on the VerbMobil II corpus, a German large-vocabulary conversational speech task, and from 14.1% to 13.5% on the British English part of the European parliament plenary sessions (EPPS) task from the 2005 TC-STAR ASR evaluation campaign. (C) 2007 Elsevier B.V. All rights reserved.
引用
收藏
页码:514 / 525
页数:12
相关论文
共 50 条
  • [21] Acoustic model combination for recognition of speech in multiple languages using support vector machines
    Gangashetty, SV
    Sekhar, CC
    Yegnanarayana, B
    2004 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, PROCEEDINGS, 2004, : 3065 - 3069
  • [22] Automatic Forest Species Recognition based on Multiple Feature Sets
    Kapp, Marcelo N.
    Bloot, Rodrigo
    Cavalin, Paulo R.
    Oliveira, Luiz E. S.
    PROCEEDINGS OF THE 2014 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2014, : 1296 - 1303
  • [23] Robust speech recognition with on-line unsupervised acoustic feature compensation
    Buera, Luis
    Miguel, Antonio
    Lleida, Eduardo
    Saz, Oscar
    Ortega, Alfonso
    2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 105 - 110
  • [24] Parametric density estimation for the classification of acoustic feature vectors in speech recognition
    Basu, S
    Micchelli, CA
    NONLINEAR MODELING: ADVANCED BLACK-BOX TECHNIQUES, 1998, : 87 - 118
  • [25] Comparing feature sets for acted and spontaneous speech in view of automatic emotion recognition
    Vogt, T
    André, E
    2005 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), VOLS 1 AND 2, 2005, : 474 - 477
  • [26] Acoustic Model with Multiple Lexicon Types for Indonesian Speech Recognition
    Abidin, Taufik Fuadi
    Misbullah, Alim
    Ferdhiana, Ridha
    Farsiah, Laina
    Aksana, Muammar Zikri
    Riza, Hammam
    APPLIED COMPUTATIONAL INTELLIGENCE AND SOFT COMPUTING, 2022, 2022
  • [27] Face Recognition Based on Fusion of Multiple Masks Local Feature Sets Using Wavelet Transform
    Al-Dabagh, Mustafa Zuhaer Nayef
    Ahmad, Muhammad Imran
    Anwar, Said Amirul
    6TH IEEE INTERNATIONAL CONFERENCE ON RECENT ADVANCES AND INNOVATIONS IN ENGINEERING (ICRAIE), 2021,
  • [28] Multiple frame size and multiple frame rate feature extraction for speech recognition
    Sarada, GL
    Nagarajan, T
    Murthy, HA
    2004 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING & COMMUNICATIONS (SPCOM), 2004, : 592 - 595
  • [29] Feature Mapping of Multiple Beamformed Sources for Robust Overlapping Speech Recognition Using a Microphone Array
    Li, Weifeng
    Wang, Longbiao
    Zhou, Yicong
    Dines, John
    Magimai-Doss, Mathew
    Bourlard, Herve
    Liao, Qingmin
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (12) : 2244 - 2255
  • [30] Speech emotion recognition using a novel feature set
    Yang, J. (jsjyj0801@163.com), 1600, Binary Information Press, P.O. Box 162, Bethel, CT 06801-0162, United States (09):