Robust speech recognition system for communication robots in real environments

被引：7

作者：

Ishi, Carlos Toshinori ^{[1
]}

Matsuda, Shigeki ^{[2
]}

Kanda, Takayuki ^{[1
]}

Jitsuhiro, Takatoshi ^{[3
]}

Ishiguro, Hiroshi ^{[1
]}

Nakamura, Satoshi ^{[2
]}

Hagita, Norihiro ^{[1
]}

机构：

[1] ATR, Intelligent Robot & Commun Labs, Kyoto, Japan

[2] Natl Inst Informat & Commun Technol, Spoken Language Commun Res Lab, ATR, Kyoto, Japan

[3] Knowledge Sci Lab, ATR, Kyoto, Japan

来源：

2006 6TH IEEE-RAS INTERNATIONAL CONFERENCE ON HUMANOID ROBOTS, VOLS 1 AND 2 | 2006年

关键词：

communication robots; speech recognition; robustness; acoustic noise; children speech;

D O I：

10.1109/ICHR.2006.321294

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The application range of communication robots could be widely expanded by the use of an automatic speech recognition (ASR) system with improved robustness for noise and for speakers of different ages. In this paper, we describe an ASR system which can robustly recognize speech by adults and children in noisy environments. We evaluate the ASR system in a communication robot placed in a real noisy environment. Speech is captured using a twelve-element microphone array arranged in the robot chest. To suppress interference and noise and to attenuate reverberation, we implemented a multi-channel system consisting of an outlier-robust generalized sidelobe canceller (RGSC) technique and a feature-space noise suppression using MMSE criteria. Speech activity periods are detected using GMM-based end-point detection (GMM-EPD). Our ASR system has two decoders for adults' and children's speech. The final hypothesis is selected based on posterior probability. We then assign a generalized word posterior probability (GWPP)-based confidence measure to this hypothesis, and if it is higher than a threshold, we transfer it to a subsequent dialog processing module. The performance of each step was evaluated for adults' and children's speech, by adding different levels of real environment noise recorded in a cafeteria. Experimental results indicated that our ASR system could achieve over 80 % word accuracy in 70 dBA noise. Further evaluation of adult speech recorded in a real noisy environment resulted in 73 % word accuracy.

引用

页码：340 / +

页数：2

共 50 条

[31] Robust speech detection method for telephone speech recognition system
ATR Interpreting Telecommunications, Research Lab, Kyoto, Japan
Speech Commun, 2 (135-148):
[32] Robust speech detection method for telephone speech recognition system
Kuroiwa, S
Naito, M
Yamamoto, S
Higuchi, N
SPEECH COMMUNICATION, 1999, 27 (02) : 135 - 148
[33] Hybrid Architecture for Robust Speech Recognition System
Pasricha, Vishal
Aggarwal, Rajesh
2016 INTERNATIONAL CONFERENCE ON RECENT ADVANCES AND INNOVATIONS IN ENGINEERING (ICRAIE), 2016,
[34] A Robust Navigation Method for Mobile Robots in Real-World Environments
Rahok, Sam Ann
Oneda, Hirohisa
Tanaka, Akio
Ozaki, Koichi
JOURNAL OF ROBOTICS AND MECHATRONICS, 2014, 26 (02) : 177 - 184
[35] A spatio-temporal speech enhancement scheme for robust speech recognition in noisy environments
Visser, E
Otsuka, M
Lee, TW
SPEECH COMMUNICATION, 2003, 41 (2-3) : 393 - 407
[36] Robust Front End Processing for Speech Recognition in Reverberant Environments: Utilization of Speech Characteristics
Petrick, Rico
Lu, Xugang
Unoki, Masashi
Akagi, Masato
Hoffmann, Ruediger
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 658 - +
[37] Real Time Distant Speech Emotion Recognition in Indoor Environments
Ahmed, Mohsin Y.
Chen, Zeya
Fass, Emma
Stankovic, John
PROCEEDINGS OF THE 14TH EAI INTERNATIONAL CONFERENCE ON MOBILE AND UBIQUITOUS SYSTEMS: COMPUTING, NETWORKING AND SERVICES (MOBIQUITOUS 2017), 2017, : 215 - 224
[38] Target Speech Detection and Separation for Communication with Humanoid Robots in Noisy Home Environments
Kim, Hyun-Don
Kim, Jinsung
Komatani, Kazunori
Ogata, Tetsuya
Okuno, Hiroshi G.
ADVANCED ROBOTICS, 2009, 23 (15) : 2093 - 2111
[39] Target speech detection and separation for communication with humanoid robots in noisy home environments
Department of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University, Yoshida-honmachi, Sakyo-ku, Kyoto 606-8501, Japan
不详
Adv Rob, 15 (2093-2111):
[40] A robust face detection system for real environments
Foresti, GL
Micheloni, C
Shidaro, L
2003 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL 3, PROCEEDINGS, 2003, : 897 - 900

← 1 2 3 4 5 →