Feature mapping using far-field microphones for distant speech recognition

被引:3
|
作者
Himawan, Ivan [1 ]
Motlicek, Petr [1 ]
Imseng, David [1 ]
Sridharan, Sridha [2 ]
机构
[1] Idiap Res Inst, Martigny, Switzerland
[2] Queensland Univ Technol, Brisbane, Qld 4001, Australia
基金
欧盟地平线“2020”;
关键词
Deep neural network; Bottleneck features; Distant speech recognition; Meetings; AMI corpus; NOISE;
D O I
10.1016/j.specom.2016.07.003
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Acoustic modeling based on deep architectures has recently gained remarkable success, with substantial improvement of speech recognition accuracy in several automatic speech recognition (ASR) tasks. For distant speech recognition, the multi-channel deep neural network based approaches rely on the powerful modeling capability of deep neural network (DNN) to learn suitable representation of distant speech directly from its multi-channel source. In this model-based combination of multiple microphones, features from each channel are concatenated and used together as an input to DNN. This allows integrating the multi-channel audio for acoustic modeling without any pre-processing steps. Despite powerful modeling capabilities of DNN, an environmental mismatch due to noise and reverberation may result in severe performance degradation when features are simply fed to a DNN without a feature enhancement step. In this paper, we introduce the nonlinear bottleneck feature mapping approach using DNN, to transform the noisy and reverberant features to its clean version. The bottleneck features derived from the DNN are used as a teacher signal because they contain relevant information to phoneme classification, and the mapping is performed with the objective of suppressing noise and reverberation. The individual and combined impacts of beamforming and speaker adaptation techniques along with the feature mapping are examined for distant large vocabulary speech recognition, using a single and multiple far-field microphones. As an alternative to beamforming, experiments with concatenating multiple channel features are conducted. The experimental results on the AMI meeting corpus show that the feature mapping, used in combination with beamforming and speaker adaptation yields a distant speech recognition performance below 50% word error rate (WER), using DNN for acoustic modeling. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:1 / 9
页数:9
相关论文
共 50 条
  • [41] Mapping molecules in scanning far-field fluorescence nanoscopy
    Haisen Ta
    Jan Keller
    Markus Haltmeier
    Sinem K. Saka
    Jürgen Schmied
    Felipe Opazo
    Philip Tinnefeld
    Axel Munk
    Stefan W. Hell
    Nature Communications, 6
  • [42] Mapping molecules in scanning far-field fluorescence nanoscopy
    Ta, Haisen
    Keller, Jan
    Haltmeier, Markus
    Saka, Sinem K.
    Schmied, Juregen
    Opazo, Felipe
    Tinnefeld, Philip
    Munk, Axel
    Hell, Stefan W.
    NATURE COMMUNICATIONS, 2015, 6
  • [43] Single-channel dereverberation by feature mapping using cascade neural networks for robust distant speaker identification and speech recognition
    Nugraha, Aditya Arie
    Yamamoto, Kazumasa
    Nakagawa, Seiichi
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2014,
  • [44] Single-channel dereverberation by feature mapping using cascade neural networks for robust distant speaker identification and speech recognition
    Aditya Arie Nugraha
    Kazumasa Yamamoto
    Seiichi Nakagawa
    EURASIP Journal on Audio, Speech, and Music Processing, 2014
  • [45] Optimal placement of microphones and piezoelectric transducer actuators for far-field sound radiation control
    Wang, BT
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1996, 99 (05): : 2975 - 2984
  • [46] Using FSV with Far-Field Patterns
    Johnson, Michael R.
    Coffey, Edgar L., III
    2012 IEEE INTERNATIONAL SYMPOSIUM ON ELECTROMAGNETIC COMPATIBILITY (EMC), 2012, : 686 - 691
  • [47] Dereverberation and Beamforming in Robust Far-Field Speaker Recognition
    Masner, Ladislav
    Plchot, Oldrich
    Matejka, Pavel
    Novotny, Ondrej
    Cernocky, Jan Honza
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1334 - 1338
  • [48] Multi-channel Opus compression for far-field automatic speech recognition with a fixed bitrate budget
    Drude, Lukas
    Heymann, Jahn
    Schwarz, Andreas
    Valin, Jean-Marc
    INTERSPEECH 2021, 2021, : 1669 - 1673
  • [49] Far-field continuous speech recognition system based on speaker localization and sub-band beamforming
    Asaei, Afsaneh
    Taghizadeh, Mohammad Javad
    Sameti, Hossein
    2008 IEEE/ACS INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, VOLS 1-3, 2008, : 495 - +
  • [50] Parameter-efficient adaptation with multi-channel adversarial training for far-field speech recognition
    Tong Niu
    Yaqi Chen
    Dan Qu
    Hengbo Hu
    ChengRan Liu
    EURASIP Journal on Audio, Speech, and Music Processing, 2025 (1)