Evaluation and Optimization of Perceptually-Based ASR Front-End

被引:11
|
作者
Junqua, Jean-Claude [1 ]
Wakita, Hisashi [2 ]
Hermansky, Hynek [2 ]
机构
[1] Matsushita Elect Ind Co Ltd, Informat Sci Lab, Cent Res Labs, Osaka 570, Japan
[2] Div Panasonic Technol Inc, Speech Technol Lab, Santa Barbara, CA 93105 USA
来源
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 1993年 / 1卷 / 01期
关键词
D O I
10.1109/89.221366
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Several recently proposed automatic speech recognition (ASR) front-ends are experimentally compared in speaker-dependent, speaker-independent (or cross-speaker) recognition. The perceptually-based linear predictive (PLP) front-end, with the root-power sums (RPS) distance measure, yields generally the highest accuracies, especially in cross-speaker recognition. It is experimentally shown that we can optimize the system and further improve recognition accuracy for speaker-independent recognition by controlling the distance measure's sensitivity to spectral peaks and the spectral tilt and by utilizing the speech dynamic features. For a digit vocabulary, and five reference templates obtained with a clustering algorithm, the optimization improves recognition accuracy from 97% to 98.1%, with respect to the PLP_RPS front-end.
引用
收藏
页码:39 / 48
页数:10
相关论文
共 50 条
  • [21] Mask Estimation Incorporating Time-Frequency Trajectories for a CASA-based ASR Front-end
    Park, Ji Hun
    Yoon, Jae Sam
    Kim, Hong Kook
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 988 - 991
  • [22] Robust Front-End for Multi-Channel ASR using Flow-Based Density Estimation
    Kim, Hyeongju
    Lee, Hyeonseung
    Kang, Woo Hyun
    Kim, Hyung Yong
    Kim, Nam Soo
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 3744 - 3750
  • [23] A new perceptually motivated MVDR-based acoustic front-end (PMVDR) for robust automatic speech recognition
    Yapanel, Umit H.
    Hansen, John H. L.
    SPEECH COMMUNICATION, 2008, 50 (02) : 142 - 152
  • [24] Timestamp-aligning and keyword-biasing end-to-end ASR front-end for a KWS system
    Shi, Gui-Xin
    Zhang, Wei-Qiang
    Wang, Guan-Bo
    Zhao, Jing
    Chai, Shu-Zhou
    Zhao, Ze-Yu
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2021, 2021 (01)
  • [25] Evaluation on the performance of a digital terrestrial front-end
    Yong, NT
    Chee, LH
    Jing, L
    Guan, LF
    Kiang, CC
    ICCE: 2005 INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS, DIGEST OF TECHNICAL PAPERS, 2005, : 167 - 168
  • [26] Perceptually-based functions for coarseness textural feature representation
    Chamorro-Martinez, J.
    Galan-Perales, E.
    Prados-Suarez, B.
    Soto-Hidalgo, J. M.
    PATTERN RECOGNITION AND IMAGE ANALYSIS, PT 1, PROCEEDINGS, 2007, 4477 : 579 - +
  • [27] Timestamp-aligning and keyword-biasing end-to-end ASR front-end for a KWS system
    Gui-Xin Shi
    Wei-Qiang Zhang
    Guan-Bo Wang
    Jing Zhao
    Shu-Zhou Chai
    Ze-Yu Zhao
    EURASIP Journal on Audio, Speech, and Music Processing, 2021
  • [28] Joint perceptually-based Intra prediction and quantization for HEVC
    Jin, Guoxin
    Cohen, Robert
    Vetro, Anthony
    Sun, Huifang
    2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2012,
  • [29] A noise robust front-end using Wiener filter, probability model and CMS for ASR
    Xu, W
    Guo, YH
    Wang, BX
    Wang, XB
    Mai, ZF
    Proceedings of the 2005 IEEE International Conference on Natural Language Processing and Knowledge Engineering (IEEE NLP-KE'05), 2005, : 102 - 105
  • [30] The Optimization for the Technology Management in the Development of the Front-end Business
    Li, Lingyuan
    PROCEEDINGS OF THE 2017 7TH INTERNATIONAL CONFERENCE ON SOCIAL NETWORK, COMMUNICATION AND EDUCATION (SNCE 2017), 2017, 82 : 328 - 330