Robust speech recognition system for communication robots in real environments

被引：7

作者：

Ishi, Carlos Toshinori ^{[1
]}

Matsuda, Shigeki ^{[2
]}

Kanda, Takayuki ^{[1
]}

Jitsuhiro, Takatoshi ^{[3
]}

Ishiguro, Hiroshi ^{[1
]}

Nakamura, Satoshi ^{[2
]}

Hagita, Norihiro ^{[1
]}

机构：

[1] ATR, Intelligent Robot & Commun Labs, Kyoto, Japan

[2] Natl Inst Informat & Commun Technol, Spoken Language Commun Res Lab, ATR, Kyoto, Japan

[3] Knowledge Sci Lab, ATR, Kyoto, Japan

来源：

2006 6TH IEEE-RAS INTERNATIONAL CONFERENCE ON HUMANOID ROBOTS, VOLS 1 AND 2 | 2006年

关键词：

communication robots; speech recognition; robustness; acoustic noise; children speech;

D O I：

10.1109/ICHR.2006.321294

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The application range of communication robots could be widely expanded by the use of an automatic speech recognition (ASR) system with improved robustness for noise and for speakers of different ages. In this paper, we describe an ASR system which can robustly recognize speech by adults and children in noisy environments. We evaluate the ASR system in a communication robot placed in a real noisy environment. Speech is captured using a twelve-element microphone array arranged in the robot chest. To suppress interference and noise and to attenuate reverberation, we implemented a multi-channel system consisting of an outlier-robust generalized sidelobe canceller (RGSC) technique and a feature-space noise suppression using MMSE criteria. Speech activity periods are detected using GMM-based end-point detection (GMM-EPD). Our ASR system has two decoders for adults' and children's speech. The final hypothesis is selected based on posterior probability. We then assign a generalized word posterior probability (GWPP)-based confidence measure to this hypothesis, and if it is higher than a threshold, we transfer it to a subsequent dialog processing module. The performance of each step was evaluated for adults' and children's speech, by adding different levels of real environment noise recorded in a cafeteria. Experimental results indicated that our ASR system could achieve over 80 % word accuracy in 70 dBA noise. Further evaluation of adult speech recorded in a real noisy environment resulted in 73 % word accuracy.

引用

页码：340 / +

页数：2

共 50 条

[41] ROBUST SPEECH RECOGNITION UNDER NOISY ENVIRONMENTS USING ASYMMETRIC TAPERS
Alam, Md Jahangir
Kenny, Patrick
O'Shaughnessy, Douglas
2012 PROCEEDINGS OF THE 20TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2012, : 1638 - 1642
[42] Silence Energy Normalization for Robust Speech Recognition in Additive Noise Environments
Tai, Chung-fu
Hung, Jeih-weih
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2558 - 2561
[43] Silence Feature Normalization for Robust Speech Recognition in Additive Noise Environments
Wang, Chieh-cheng
Pan, Chi-an
Hung, Jeih-weih
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1028 - 1031
[44] Robust technologies towards automatic speech recognition in car noise environments
Ding, Pei
He, Lei
Yan, Xiang
Zhao, Rui
Hao, Jie
2006 8TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, VOLS 1-4, 2006, : 776 - +
[45] Blind source extraction for robust speech recognition in multisource noisy environments
Nesta, Francesco
Matassoni, Marco
COMPUTER SPEECH AND LANGUAGE, 2013, 27 (03): : 703 - 725
[46] Robust automatic speech recognition based on neural network in reverberant environments
Bai, L.
Li, H. L.
He, Y. Y.
CIVIL, ARCHITECTURE AND ENVIRONMENTAL ENGINEERING, VOLS 1 AND 2, 2017, : 1319 - 1324
[47] Recursive estimation of time-varying environments for robust speech recognition
Zhao, YX
Wang, SJ
Yen, KC
2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 225 - 228
[48] Development of noise robust real time automatic speech recognition system for Kannada language/dialects
Yadava, G. Thimmaraja
Nagaraja, B. G.
Jayanna, H. S.
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 135
[49] Speech Intent Recognition for Robots
Shen, Borui
Inkpen, Diana
PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON MATHEMATICS AND COMPUTERS IN SCIENCES AND IN INDUSTRY (MCSI 2016), 2016, : 185 - 190
[50] Speech recognition in parallel robots
Jun, Tao
PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ROBOTICS, INTELLIGENT CONTROL AND ARTIFICIAL INTELLIGENCE (RICAI 2019), 2019, : 733 - 737

← 1 2 3 4 5 →