Feature mapping using far-field microphones for distant speech recognition

被引：3

作者：

Himawan, Ivan ^{[1
]}

Motlicek, Petr ^{[1
]}

Imseng, David ^{[1
]}

Sridharan, Sridha ^{[2
]}

机构：

[1] Idiap Res Inst, Martigny, Switzerland

[2] Queensland Univ Technol, Brisbane, Qld 4001, Australia

来源：

SPEECH COMMUNICATION | 2016年 / 83卷

基金：

欧盟地平线“2020”;

关键词：

Deep neural network; Bottleneck features; Distant speech recognition; Meetings; AMI corpus; NOISE;

D O I：

10.1016/j.specom.2016.07.003

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Acoustic modeling based on deep architectures has recently gained remarkable success, with substantial improvement of speech recognition accuracy in several automatic speech recognition (ASR) tasks. For distant speech recognition, the multi-channel deep neural network based approaches rely on the powerful modeling capability of deep neural network (DNN) to learn suitable representation of distant speech directly from its multi-channel source. In this model-based combination of multiple microphones, features from each channel are concatenated and used together as an input to DNN. This allows integrating the multi-channel audio for acoustic modeling without any pre-processing steps. Despite powerful modeling capabilities of DNN, an environmental mismatch due to noise and reverberation may result in severe performance degradation when features are simply fed to a DNN without a feature enhancement step. In this paper, we introduce the nonlinear bottleneck feature mapping approach using DNN, to transform the noisy and reverberant features to its clean version. The bottleneck features derived from the DNN are used as a teacher signal because they contain relevant information to phoneme classification, and the mapping is performed with the objective of suppressing noise and reverberation. The individual and combined impacts of beamforming and speaker adaptation techniques along with the feature mapping are examined for distant large vocabulary speech recognition, using a single and multiple far-field microphones. As an alternative to beamforming, experiments with concatenating multiple channel features are conducted. The experimental results on the AMI meeting corpus show that the feature mapping, used in combination with beamforming and speaker adaptation yields a distant speech recognition performance below 50% word error rate (WER), using DNN for acoustic modeling. (C) 2016 Elsevier B.V. All rights reserved.

引用

页码：1 / 9

页数：9

共 50 条

[41] Mapping molecules in scanning far-field fluorescence nanoscopy
Haisen Ta
Jan Keller
Markus Haltmeier
Sinem K. Saka
Jürgen Schmied
Felipe Opazo
Philip Tinnefeld
Axel Munk
Stefan W. Hell
Nature Communications, 6
[42] Mapping molecules in scanning far-field fluorescence nanoscopy
Ta, Haisen
Keller, Jan
Haltmeier, Markus
Saka, Sinem K.
Schmied, Juregen
Opazo, Felipe
Tinnefeld, Philip
Munk, Axel
Hell, Stefan W.
NATURE COMMUNICATIONS, 2015, 6
[43] Single-channel dereverberation by feature mapping using cascade neural networks for robust distant speaker identification and speech recognition
Nugraha, Aditya Arie
Yamamoto, Kazumasa
Nakagawa, Seiichi
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2014,
[44] Single-channel dereverberation by feature mapping using cascade neural networks for robust distant speaker identification and speech recognition
Aditya Arie Nugraha
Kazumasa Yamamoto
Seiichi Nakagawa
EURASIP Journal on Audio, Speech, and Music Processing, 2014
[45] Optimal placement of microphones and piezoelectric transducer actuators for far-field sound radiation control
Wang, BT
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1996, 99 (05): : 2975 - 2984
[46] Using FSV with Far-Field Patterns
Johnson, Michael R.
Coffey, Edgar L., III
2012 IEEE INTERNATIONAL SYMPOSIUM ON ELECTROMAGNETIC COMPATIBILITY (EMC), 2012, : 686 - 691
[47] Dereverberation and Beamforming in Robust Far-Field Speaker Recognition
Masner, Ladislav
Plchot, Oldrich
Matejka, Pavel
Novotny, Ondrej
Cernocky, Jan Honza
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1334 - 1338
[48] Multi-channel Opus compression for far-field automatic speech recognition with a fixed bitrate budget
Drude, Lukas
Heymann, Jahn
Schwarz, Andreas
Valin, Jean-Marc
INTERSPEECH 2021, 2021, : 1669 - 1673
[49] Far-field continuous speech recognition system based on speaker localization and sub-band beamforming
Asaei, Afsaneh
Taghizadeh, Mohammad Javad
Sameti, Hossein
2008 IEEE/ACS INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, VOLS 1-3, 2008, : 495 - +
[50] Parameter-efficient adaptation with multi-channel adversarial training for far-field speech recognition
Tong Niu
Yaqi Chen
Dan Qu
Hengbo Hu
ChengRan Liu
EURASIP Journal on Audio, Speech, and Music Processing, 2025 (1)

← 1 2 3 4 5 →