Evaluation and Optimization of Perceptually-Based ASR Front-End

被引：11

作者：

Junqua, Jean-Claude ^{[1
]}

Wakita, Hisashi ^{[2
]}

Hermansky, Hynek ^{[2
]}

机构：

[1] Matsushita Elect Ind Co Ltd, Informat Sci Lab, Cent Res Labs, Osaka 570, Japan

[2] Div Panasonic Technol Inc, Speech Technol Lab, Santa Barbara, CA 93105 USA

来源：

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 1993年 / 1卷 / 01期

关键词：

D O I：

10.1109/89.221366

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Several recently proposed automatic speech recognition (ASR) front-ends are experimentally compared in speaker-dependent, speaker-independent (or cross-speaker) recognition. The perceptually-based linear predictive (PLP) front-end, with the root-power sums (RPS) distance measure, yields generally the highest accuracies, especially in cross-speaker recognition. It is experimentally shown that we can optimize the system and further improve recognition accuracy for speaker-independent recognition by controlling the distance measure's sensitivity to spectral peaks and the spectral tilt and by utilizing the speech dynamic features. For a digit vocabulary, and five reference templates obtained with a clustering algorithm, the optimization improves recognition accuracy from 97% to 98.1%, with respect to the PLP_RPS front-end.

引用

页码：39 / 48

页数：10

共 50 条

[21] Mask Estimation Incorporating Time-Frequency Trajectories for a CASA-based ASR Front-end
Park, Ji Hun
Yoon, Jae Sam
Kim, Hong Kook
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 988 - 991
[22] Robust Front-End for Multi-Channel ASR using Flow-Based Density Estimation
Kim, Hyeongju
Lee, Hyeonseung
Kang, Woo Hyun
Kim, Hyung Yong
Kim, Nam Soo
PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 3744 - 3750
[23] A new perceptually motivated MVDR-based acoustic front-end (PMVDR) for robust automatic speech recognition
Yapanel, Umit H.
Hansen, John H. L.
SPEECH COMMUNICATION, 2008, 50 (02) : 142 - 152
[24] Timestamp-aligning and keyword-biasing end-to-end ASR front-end for a KWS system
Shi, Gui-Xin
Zhang, Wei-Qiang
Wang, Guan-Bo
Zhao, Jing
Chai, Shu-Zhou
Zhao, Ze-Yu
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2021, 2021 (01)
[25] Evaluation on the performance of a digital terrestrial front-end
Yong, NT
Chee, LH
Jing, L
Guan, LF
Kiang, CC
ICCE: 2005 INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS, DIGEST OF TECHNICAL PAPERS, 2005, : 167 - 168
[26] Perceptually-based functions for coarseness textural feature representation
Chamorro-Martinez, J.
Galan-Perales, E.
Prados-Suarez, B.
Soto-Hidalgo, J. M.
PATTERN RECOGNITION AND IMAGE ANALYSIS, PT 1, PROCEEDINGS, 2007, 4477 : 579 - +
[27] Timestamp-aligning and keyword-biasing end-to-end ASR front-end for a KWS system
Gui-Xin Shi
Wei-Qiang Zhang
Guan-Bo Wang
Jing Zhao
Shu-Zhou Chai
Ze-Yu Zhao
EURASIP Journal on Audio, Speech, and Music Processing, 2021
[28] Joint perceptually-based Intra prediction and quantization for HEVC
Jin, Guoxin
Cohen, Robert
Vetro, Anthony
Sun, Huifang
2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2012,
[29] A noise robust front-end using Wiener filter, probability model and CMS for ASR
Xu, W
Guo, YH
Wang, BX
Wang, XB
Mai, ZF
Proceedings of the 2005 IEEE International Conference on Natural Language Processing and Knowledge Engineering (IEEE NLP-KE'05), 2005, : 102 - 105
[30] The Optimization for the Technology Management in the Development of the Front-end Business
Li, Lingyuan
PROCEEDINGS OF THE 2017 7TH INTERNATIONAL CONFERENCE ON SOCIAL NETWORK, COMMUNICATION AND EDUCATION (SNCE 2017), 2017, 82 : 328 - 330

← 1 2 3 4 5 →