Microphone Array Processing for Distant Speech Recognition: Spherical Arrays

被引：0

作者：

McDonough, John ^{[1
]}

Kumatani, Kenichi ^{[2
]}

Raj, Bhiksha ^{[3
]}

机构：

[1] Carnegie Mellon Univ, Voci Technol Inc, Pittsburgh, PA 15213 USA

[2] Disney Res, Pittsburgh, PA USA

[3] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA

来源：

2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC) | 2012年

关键词：

DESIGN;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Distant speech recognition (DSR) holds out the promise of the most natural human computer interface because it enables man-machine interactions through speech, without the necessity of donning intrusive body- or head-mounted microphones. With the advent of the Microsoft Kinect, the application of non-uniform linear arrays to the DSR problem has become commonplace. Performance analysis of such arrays is well-represented in the literature. Recently, spherical arrays have become the subject of intense research interest in the acoustic array processing community. Such arrays have heretofore been analyzed solely with theoretical metrics under idealized conditions. In this work, we analyze such arrays under realistic conditions. Moreover, we compare a linear array with 64-channel arrays and a total length of 126 cm to a spherical array with 32 channels and a radius of 4.2 cm; we found that these provided word error rates of 9.3% and 10.2%, respectively, on a DSR task. For a speaker positioned at an oblique angle with respect to the linear array, we recorded error rates of 12.8% and 9.7%, respectively, for the linear and spherical arrays. The compact size and outstanding performance of the spherical array recommends itself well to space-limited and mobile applications such as home-gaming consoles and humanoid robots.

引用

页数：10

共 50 条

[31] Distant-talking speech recognition based on a 3-D Viterbi search using a microphone array
Yamada, T
Nakamura, S
Shikano, K
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2002, 10 (02): : 48 - 56
[32] Realistic multi-microphone data simulation for distant speech recognition
Ravanelli, Mirco
Svaizer, Piergiorgio
Omologo, Maurizio
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2786 - 2790
[33] Objective performance analysis of spherical microphone arrays for speech enhancement in rooms
Peled, Yotam
Rafaely, Boaz
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2012, 132 (03): : 1473 - 1481
[34] A French corpus for distant-microphone speech processing in real homes
Bertin, Nancy
Camberlein, Ewen
Vincent, Emmanuel
Lebarbenchon, Romain
Peillon, Stephane
Lamande, Eric
Sivasankaran, Sunit
Birnbot, Frederic
Illina, Irina
Tom, Ariane
Fleury, Sylvain
Jamet, Eric
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2781 - 2785
[35] RECOGNITION OF OVERLAPPING SPEECH USING DIGITAL MEMS MICROPHONE ARRAYS
Zwyssig, Erich
Faubel, Friedrich
Renals, Steve
Lincoln, Mike
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7068 - 7072
[36] COMBINING CEPSTRAL NORMALIZATION AND COCHLEAR IMPLANT-LIKE SPEECH PROCESSING FOR MICROPHONE ARRAY-BASED SPEECH RECOGNITION
Cong-Thanh Do
Taghizadeh, Mohammad J.
Garner, Philip N.
2012 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2012), 2012, : 137 - 142
[37] An efficient and robust speech dereverberation method using spherical microphone array
Li, Jian
Ding, Jiance
Zheng, Chengshi
Li, Xiaodong
2018 IEEE 23RD INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2018,
[38] Speech recognition in cars by speaker localization using microphone array
Kondo, Keisuke
Nagai, Takayuki
Kaneko, Masahide
Kurematsu, Akira
Systems and Computers in Japan, 2003, 34 (08) : 1 - 12
[39] Robust continuous speech recognition system based on a microphone array
Lleida, E
Fernandez, J
Masgrau, E
PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 241 - 244
[40] Microphone array based speech recognition with different talker-array positions
Omologo, M
Matassoni, M
Svaizer, P
Giuliani, D
1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 227 - 230

← 1 2 3 4 5 →