Improvement of robot audition by interfacing sound source separation and automatic speech recognition with missing feature theory

被引：15

作者：

Yamamoto, S ^{[1
]}

Nakadai, K ^{[1
]}

Tsujino, H ^{[1
]}

Yokoyama, T ^{[1
]}

Okuno, HG ^{[1
]}

机构：

[1] Kyoto Univ, Grad Sch Informat, Kyoto, Japan

来源：

2004 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, VOLS 1- 5, PROCEEDINGS | 2004年

关键词：

D O I：

10.1109/ROBOT.2004.1308039

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We have been developed robot audition system using the active direction-pass filter (ADPF) with the Scattering Theory, and demonstrated that the humanoid SIG could separate and recognize three simultaneous speeches originating from different directions. This is the first result that a robot can listen to several things simultaneously. However, its general applicability to other robots is not yet confirmed. Since automatic speech recognition (ASR) requires direction- and speaker-dependent acoustic models, it is difficult to adapt various kinds of environments. In addition ASR with lots of acoustic models causes slow processing. In this paper, these three problems are resolved. First, we confirmed the generality of the ADPF by applying it to two humanoids, SIG2 and Replie, under different enviromnents. Next, we present the new interface between ADPF and ASR based on the Missing Feature Theory, which masks broken features of separated sound to make them unavailable to ASR. This new interface improved the recognition performance of three simultaneous speeches tip to about 90%. Finally, since the ASR uses only a single acoustic model that is direction- and speaker-independent and created under clean environments, the processing of the whole system was made very light and fast.

引用

页码：1517 / 1523

页数：7

共 39 条

[1] An Improvement in Automatic Speech Recognition Using Soft Missing Feature Masks for Robot Audition
Takahashi, Toru
Nakadai, Kazuhiro
Komatani, Kazunori
Ogata, Tetsuya
Okuno, Hiroshi G.
IEEE/RSJ 2010 INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2010), 2010, : 964 - 969
[2] Enhanced robot speech recognition based on microphone array source separation and missing feature theory
Yamamoto, S
Valin, JM
Nakadai, K
Rouat, J
Michaud, F
Ogata, T
Okuno, HG
2005 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), VOLS 1-4, 2005, : 1477 - 1482
[3] Robot Audition: Missing Feature Theory Approach and Active Audition
Okuno, Hiroshi G.
Nakadai, Kazuhiro
Kim, Hyun-Don
ROBOTICS RESEARCH, 2011, 70 : 227 - +
[4] Exploiting known sound source signals to improve ICA-based robot audition in speech separation and recognition
Takeda, Ryu
Nakadai, Kazuhiro
Komatani, Kazunori
Ogata, Tetsuya
Okuno, Hiroshi G.
2007 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, VOLS 1-9, 2007, : 1763 - +
[5] SOUND SOURCE SEPARATION OF MOVING SPEAKERS FOR ROBOT AUDITION
Nakadai, Kazuhiro
Nakajima, Hirofumi
Hasegawa, Yuji
Tsujino, Hiroshi
2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3685 - 3688
[6] Sound Source Separation and Automatic Speech Recognition for Moving Sources
Nakadai, Kazuhiro
Nakajima, Hirofumi
Ince, Goekhan
Hasegawa, Yuji
IEEE/RSJ 2010 INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2010), 2010, : 976 - 981
[7] Sound Source Separation for Robot Audition using Deep Learning
Noda, Kuniaki
Hashimoto, Naoya
Nakadai, Kazuhiro
Ogata, Tetsuya
2015 IEEE-RAS 15TH INTERNATIONAL CONFERENCE ON HUMANOID ROBOTS (HUMANOIDS), 2015, : 389 - 394
[8] Design and implementation of a robot audition system for automatic speech recognition of simultaneous speech
Yamamoto, Shun'ichi
Nakadai, Kazuhiro
Nakano, Mikio
Tsujino, Hiroshi
Valin, Jean-Marc
Komatani, Kazunori
Ogata, Tetsuya
Okuno, Hiroshi G.
2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 111 - +
[9] Multiple target recognition based on blind source separation and missing feature theory
Qi, H
Tao, X
Tao, LH
IEEE CAMSAP 2005: FIRST INTERNATIONAL WORKSHOP ON COMPUTATIONAL ADVANCES IN MULTI-SENSOR ADAPTIVE PROCESSING, 2005, : 205 - 208
[10] High performance sound source separation adaptable to environmental changes for robot audition
Nakajima, Hirofumi
Nakadai, Kazuhiro
Hasegawa, Yuuji
Tsujino, Hiroshi
2008 IEEE/RSJ INTERNATIONAL CONFERENCE ON ROBOTS AND INTELLIGENT SYSTEMS, VOLS 1-3, CONFERENCE PROCEEDINGS, 2008, : 2165 - 2171

← 1 2 3 4 →