Feature mapping using far-field microphones for distant speech recognition

被引:3
|
作者
Himawan, Ivan [1 ]
Motlicek, Petr [1 ]
Imseng, David [1 ]
Sridharan, Sridha [2 ]
机构
[1] Idiap Res Inst, Martigny, Switzerland
[2] Queensland Univ Technol, Brisbane, Qld 4001, Australia
基金
欧盟地平线“2020”;
关键词
Deep neural network; Bottleneck features; Distant speech recognition; Meetings; AMI corpus; NOISE;
D O I
10.1016/j.specom.2016.07.003
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Acoustic modeling based on deep architectures has recently gained remarkable success, with substantial improvement of speech recognition accuracy in several automatic speech recognition (ASR) tasks. For distant speech recognition, the multi-channel deep neural network based approaches rely on the powerful modeling capability of deep neural network (DNN) to learn suitable representation of distant speech directly from its multi-channel source. In this model-based combination of multiple microphones, features from each channel are concatenated and used together as an input to DNN. This allows integrating the multi-channel audio for acoustic modeling without any pre-processing steps. Despite powerful modeling capabilities of DNN, an environmental mismatch due to noise and reverberation may result in severe performance degradation when features are simply fed to a DNN without a feature enhancement step. In this paper, we introduce the nonlinear bottleneck feature mapping approach using DNN, to transform the noisy and reverberant features to its clean version. The bottleneck features derived from the DNN are used as a teacher signal because they contain relevant information to phoneme classification, and the mapping is performed with the objective of suppressing noise and reverberation. The individual and combined impacts of beamforming and speaker adaptation techniques along with the feature mapping are examined for distant large vocabulary speech recognition, using a single and multiple far-field microphones. As an alternative to beamforming, experiments with concatenating multiple channel features are conducted. The experimental results on the AMI meeting corpus show that the feature mapping, used in combination with beamforming and speaker adaptation yields a distant speech recognition performance below 50% word error rate (WER), using DNN for acoustic modeling. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:1 / 9
页数:9
相关论文
共 50 条
  • [21] Introduction to the Issue on Far-Field Speech Processing in the Era of Deep Learning: Speech Enhancement, Separation, and Recognition
    Watanabe, Shinji
    Araki, Shoko
    Bacchiani, Michiel
    Haeb-Umbach, Reinhold
    Seltzer, Michael L.
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2019, 13 (04) : 785 - 786
  • [22] MULTI-MICROPHONE NEURAL SPEECH SEPARATION FOR FAR-FIELD MULTI-TALKER SPEECH RECOGNITION
    Yoshioka, Takuya
    Erdogan, Hakan
    Chen, Zhuo
    Alleva, Fil
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5739 - 5743
  • [23] Multichannel spatial clustering for robust far-field automatic speech recognition in mismatched conditions
    Mandel, Michael I.
    Barker, Jon P.
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1991 - 1995
  • [24] The LeVoice Far-field Speech Recognition System for VOiCES from a Distance Challenge 2019
    Liang, Yulong
    Yang, Lin
    Wang, Xuyang
    Li, Yingjie
    Jia, Chen
    Wang, Junjie
    INTERSPEECH 2019, 2019, : 2483 - 2487
  • [25] IR-GAN: Room impulse response generator for far-field speech recognition
    Ratnarajah, Anton
    Tang, Zhenyu
    Manocha, Dinesh
    INTERSPEECH 2021, 2021, : 286 - 290
  • [26] Teager Energy Subband Filtered Features for Near and Far-Field Automatic Speech Recognition
    Kamble, Madhu R.
    Nayak, Shekhar
    Shaik, M. Ali Basha
    Rath, Shakti P.
    Vij, Vikram
    Patil, Hemant A.
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 491 - 496
  • [27] A Generalized Nonnegative Tensor Factorization Approach for Distant Speech Recognition With Distributed Microphones
    Mirsamadi, Seyedmahdad
    Hansen, John H. L.
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (10) : 1721 - 1731
  • [28] Far-Field imaging: Density of states mapping
    Won, Rachel
    NATURE PHOTONICS, 2008, 2 (03) : 134 - 134
  • [29] DEREVERBERATION AND BEAMFORMING IN FAR-FIELD SPEAKER RECOGNITION
    Mosner, Ladislav
    Matejka, Pavel
    Novotny, Ondrej
    Cernocky, Jan
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5254 - 5258
  • [30] Detection, Diarization, and Transcription of Far-Field Lecture Speech
    Huang, Jing
    Marcheret, Etienne
    Visweswariah, Karthik
    Libal, Vit
    Potamianos, Gerasimos
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2512 - 2515