Microphone Array Processing for Distant Speech Recognition: Spherical Arrays

被引:0
|
作者
McDonough, John [1 ]
Kumatani, Kenichi [2 ]
Raj, Bhiksha [3 ]
机构
[1] Carnegie Mellon Univ, Voci Technol Inc, Pittsburgh, PA 15213 USA
[2] Disney Res, Pittsburgh, PA USA
[3] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
关键词
DESIGN;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Distant speech recognition (DSR) holds out the promise of the most natural human computer interface because it enables man-machine interactions through speech, without the necessity of donning intrusive body- or head-mounted microphones. With the advent of the Microsoft Kinect, the application of non-uniform linear arrays to the DSR problem has become commonplace. Performance analysis of such arrays is well-represented in the literature. Recently, spherical arrays have become the subject of intense research interest in the acoustic array processing community. Such arrays have heretofore been analyzed solely with theoretical metrics under idealized conditions. In this work, we analyze such arrays under realistic conditions. Moreover, we compare a linear array with 64-channel arrays and a total length of 126 cm to a spherical array with 32 channels and a radius of 4.2 cm; we found that these provided word error rates of 9.3% and 10.2%, respectively, on a DSR task. For a speaker positioned at an oblique angle with respect to the linear array, we recorded error rates of 12.8% and 9.7%, respectively, for the linear and spherical arrays. The compact size and outstanding performance of the spherical array recommends itself well to space-limited and mobile applications such as home-gaming consoles and humanoid robots.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Processing of speech signals using a microphone array for intelligent robots
    Hu, I
    Cheng, CC
    Liu, WH
    PROCEEDINGS OF THE INSTITUTION OF MECHANICAL ENGINEERS PART I-JOURNAL OF SYSTEMS AND CONTROL ENGINEERING, 2005, 219 (I2) : 133 - 143
  • [42] A microphone array processing technique for speech enhancement in a reverberant space
    Liu, QG
    Champagne, B
    Kabal, P
    SPEECH COMMUNICATION, 1996, 18 (04) : 317 - 334
  • [43] Two-channel microphone array processing for speech enhancement
    Yan, ZL
    Du, LM
    Wei, JQ
    Zeng, H
    PROCEEDINGS OF THE 2003 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL II: COMMUNICATIONS-MULTIMEDIA SYSTEMS & APPLICATIONS, 2003, : 548 - 551
  • [44] A signal subspace tracking algorithm for microphone array processing of speech
    Affes, S
    Grenier, Y
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1997, 5 (05): : 425 - 437
  • [45] Calibration, optimization, and DSP implementation of microphone array for speech processing
    Wang, A
    Yao, K
    Hudson, RE
    Korompis, D
    Lorenzelli, F
    Soli, SD
    Gao, S
    VLSI SIGNAL PROCESSING, IX, 1996, : 221 - 230
  • [46] Recurrent Models for Auditory Attention in Multi-Microphone Distant Speech Recognition
    Kim, Suyoun
    Lane, Ian
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3838 - 3842
  • [47] SPEAKER IDENTIFICATION WITH DISTANT MICROPHONE SPEECH
    Jin, Qin
    Li, Runxin
    Yang, Qian
    Laskowski, Kornel
    Schultz, Tanja
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4518 - 4521
  • [48] EXPLOITING INTER-MICROPHONE AGREEMENT FOR HYPOTHESIS COMBINATION IN DISTANT SPEECH RECOGNITION
    Guerrero, Cristina
    Omologo, Maurizio
    2014 PROCEEDINGS OF THE 22ND EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2014, : 2385 - 2389
  • [49] Subband parameter optimization of microphone arrays for speech recognition in reverberant environments
    Seltzer, ML
    Stern, RM
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 408 - 411
  • [50] Spatio-temporal processing for distant speech recognition
    Low, SY
    Togneri, R
    Nordholm, S
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 1001 - 1004