Microphone Array Processing for Distant Speech Recognition: Spherical Arrays

被引:0
|
作者
McDonough, John [1 ]
Kumatani, Kenichi [2 ]
Raj, Bhiksha [3 ]
机构
[1] Carnegie Mellon Univ, Voci Technol Inc, Pittsburgh, PA 15213 USA
[2] Disney Res, Pittsburgh, PA USA
[3] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
关键词
DESIGN;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Distant speech recognition (DSR) holds out the promise of the most natural human computer interface because it enables man-machine interactions through speech, without the necessity of donning intrusive body- or head-mounted microphones. With the advent of the Microsoft Kinect, the application of non-uniform linear arrays to the DSR problem has become commonplace. Performance analysis of such arrays is well-represented in the literature. Recently, spherical arrays have become the subject of intense research interest in the acoustic array processing community. Such arrays have heretofore been analyzed solely with theoretical metrics under idealized conditions. In this work, we analyze such arrays under realistic conditions. Moreover, we compare a linear array with 64-channel arrays and a total length of 126 cm to a spherical array with 32 channels and a radius of 4.2 cm; we found that these provided word error rates of 9.3% and 10.2%, respectively, on a DSR task. For a speaker positioned at an oblique angle with respect to the linear array, we recorded error rates of 12.8% and 9.7%, respectively, for the linear and spherical arrays. The compact size and outstanding performance of the spherical array recommends itself well to space-limited and mobile applications such as home-gaming consoles and humanoid robots.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Distant-talking speech recognition based on a 3-D Viterbi search using a microphone array
    Yamada, T
    Nakamura, S
    Shikano, K
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2002, 10 (02): : 48 - 56
  • [32] Realistic multi-microphone data simulation for distant speech recognition
    Ravanelli, Mirco
    Svaizer, Piergiorgio
    Omologo, Maurizio
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2786 - 2790
  • [33] Objective performance analysis of spherical microphone arrays for speech enhancement in rooms
    Peled, Yotam
    Rafaely, Boaz
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2012, 132 (03): : 1473 - 1481
  • [34] A French corpus for distant-microphone speech processing in real homes
    Bertin, Nancy
    Camberlein, Ewen
    Vincent, Emmanuel
    Lebarbenchon, Romain
    Peillon, Stephane
    Lamande, Eric
    Sivasankaran, Sunit
    Birnbot, Frederic
    Illina, Irina
    Tom, Ariane
    Fleury, Sylvain
    Jamet, Eric
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2781 - 2785
  • [35] RECOGNITION OF OVERLAPPING SPEECH USING DIGITAL MEMS MICROPHONE ARRAYS
    Zwyssig, Erich
    Faubel, Friedrich
    Renals, Steve
    Lincoln, Mike
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7068 - 7072
  • [36] COMBINING CEPSTRAL NORMALIZATION AND COCHLEAR IMPLANT-LIKE SPEECH PROCESSING FOR MICROPHONE ARRAY-BASED SPEECH RECOGNITION
    Cong-Thanh Do
    Taghizadeh, Mohammad J.
    Garner, Philip N.
    2012 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2012), 2012, : 137 - 142
  • [37] An efficient and robust speech dereverberation method using spherical microphone array
    Li, Jian
    Ding, Jiance
    Zheng, Chengshi
    Li, Xiaodong
    2018 IEEE 23RD INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2018,
  • [38] Speech recognition in cars by speaker localization using microphone array
    Kondo, Keisuke
    Nagai, Takayuki
    Kaneko, Masahide
    Kurematsu, Akira
    Systems and Computers in Japan, 2003, 34 (08) : 1 - 12
  • [39] Robust continuous speech recognition system based on a microphone array
    Lleida, E
    Fernandez, J
    Masgrau, E
    PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 241 - 244
  • [40] Microphone array based speech recognition with different talker-array positions
    Omologo, M
    Matassoni, M
    Svaizer, P
    Giuliani, D
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 227 - 230