Feature mapping using far-field microphones for distant speech recognition

被引:3
|
作者
Himawan, Ivan [1 ]
Motlicek, Petr [1 ]
Imseng, David [1 ]
Sridharan, Sridha [2 ]
机构
[1] Idiap Res Inst, Martigny, Switzerland
[2] Queensland Univ Technol, Brisbane, Qld 4001, Australia
基金
欧盟地平线“2020”;
关键词
Deep neural network; Bottleneck features; Distant speech recognition; Meetings; AMI corpus; NOISE;
D O I
10.1016/j.specom.2016.07.003
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Acoustic modeling based on deep architectures has recently gained remarkable success, with substantial improvement of speech recognition accuracy in several automatic speech recognition (ASR) tasks. For distant speech recognition, the multi-channel deep neural network based approaches rely on the powerful modeling capability of deep neural network (DNN) to learn suitable representation of distant speech directly from its multi-channel source. In this model-based combination of multiple microphones, features from each channel are concatenated and used together as an input to DNN. This allows integrating the multi-channel audio for acoustic modeling without any pre-processing steps. Despite powerful modeling capabilities of DNN, an environmental mismatch due to noise and reverberation may result in severe performance degradation when features are simply fed to a DNN without a feature enhancement step. In this paper, we introduce the nonlinear bottleneck feature mapping approach using DNN, to transform the noisy and reverberant features to its clean version. The bottleneck features derived from the DNN are used as a teacher signal because they contain relevant information to phoneme classification, and the mapping is performed with the objective of suppressing noise and reverberation. The individual and combined impacts of beamforming and speaker adaptation techniques along with the feature mapping are examined for distant large vocabulary speech recognition, using a single and multiple far-field microphones. As an alternative to beamforming, experiments with concatenating multiple channel features are conducted. The experimental results on the AMI meeting corpus show that the feature mapping, used in combination with beamforming and speaker adaptation yields a distant speech recognition performance below 50% word error rate (WER), using DNN for acoustic modeling. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:1 / 9
页数:9
相关论文
共 50 条
  • [1] Far-Field Automatic Speech Recognition
    Haeb-Umbach, Reinhold
    Heymann, Jahn
    Drude, Lukas
    Watanabe, Shinji
    Delcroix, Marc
    Nakatani, Tomohiro
    PROCEEDINGS OF THE IEEE, 2021, 109 (02) : 124 - 148
  • [2] Far-Field Speech Recognition Using Multivariate Autoregressive Models
    Ganapathy, Sriram
    Harish, Madhumita
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3023 - 3027
  • [3] AN INVESTIGATION INTO USING PARALLEL DATA FOR FAR-FIELD SPEECH RECOGNITION
    Qian, Yanmin
    Tan, Tian
    Yu, Dong
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5725 - 5729
  • [4] Far-Field Speech Enhancement using Heteroscedastic Autoencoder for Improved Speech Recognition
    Kumar, Shashi
    Rath, Shakti P.
    INTERSPEECH 2019, 2019, : 446 - 450
  • [5] Learning to Rank Microphones for Distant Speech Recognition
    Cornell, Samuele
    Brutti, Alessio
    Matassoni, Marco
    Squartini, Stefano
    INTERSPEECH 2021, 2021, : 3855 - 3859
  • [6] Dereverberation of autoregressive envelopes for far-field speech recognition
    Purushothaman, Anurenjan
    Sreeram, Anirudh
    Kumar, Rohit
    Ganapathy, Sriram
    COMPUTER SPEECH AND LANGUAGE, 2022, 72
  • [7] Beamforming Networks Using Spatial Covariance Features for Far-field Speech Recognition
    Xiao, Xiong
    Watanabe, Shinji
    Chng, Eng Siong
    Li, Haizhou
    2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
  • [8] Module-Based End-to-End Distant Speech Processing: A case study of far-field automatic speech recognition
    Chang, Xuankai
    Watanabe, Shinji
    Delcroix, Marc
    Ochiai, Tsubasa
    Zhang, Wangyou
    Qian, Yanmin
    IEEE SIGNAL PROCESSING MAGAZINE, 2024, 41 (06) : 39 - 50
  • [9] Hilbert Envelope Based Features for Far-Field Speech Recognition
    Thomas, Samuel
    Ganapathy, Srirarn
    Hermansky, Hynek
    MACHINE LEARNING FOR MULTIMODAL INTERACTION, PROCEEDINGS, 2008, 5237 : 119 - +
  • [10] FAR-FIELD SPEECH RECOGNITION USING CNN-DNN-HMM WITH CONVOLUTION IN TIME
    Yoshioka, Takuya
    Karita, Shigeki
    Nakatani, Tomohiro
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4360 - 4364