Factorised representations for neural network adaptation to diverse acoustic environments

被引:4
|
作者
Fainberg, Joachim [1 ]
Renals, Steve [1 ]
Bell, Peter [1 ]
机构
[1] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh, Midlothian, Scotland
来源
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION | 2017年
基金
欧盟地平线“2020”;
关键词
speech recognition; adaptation; acoustic factorisation; i-vectors; deep neural networks;
D O I
10.21437/Interspeech.2017-1365
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Adapting acoustic models jointly to both speaker and environment has been shown to be effective. In many realistic scenarios, however, either the speaker or environment at test time might be unknown, or there may be insufficient data to learn a joint transform. Generating independent speaker and environment transforms improves the match of an acoustic model to unseen combinations. Using i-vectors, we demonstrate that it is possible to factorise speaker or environment information using multi-condition training with neural networks. Specifically, we extract bottleneck features from networks trained to classify either speakers or environments. We perform experiments on the Wall Street Journal corpus combined with environment noise from the Diverse Environments Multichannel Acoustic Noise Database. Using the factorised i-vectors we show improvements in word error rates on perturbed versions of the eval92 and dev93 test sets, both when one factor is missing and when the factors am seen but not in the desired combination.
引用
收藏
页码:749 / 753
页数:5
相关论文
共 50 条
  • [21] A Novel Lightweight Deep Convolutional Neural Network Model for Human Emotions Recognition in Diverse Environments
    Kalsum, Tehmina
    Mehmood, Zahid
    JOURNAL OF SENSORS, 2023, 2023
  • [22] Phoneme recognition with a neural network: Comparisons of acoustic representations including those produced by an auditory model
    Treurniet, W.C.
    Hunt, M.J.
    Lefebvre, C.
    Jacobson, Z.
    Neural Networks, 1988, 1 (1 SUPPL)
  • [23] On network representations of antennas inside resonating environments
    Gronwald, F.
    Gluege, S.
    Nitsch, J.
    ADVANCES IN RADIO SCIENCE, 2007, 5 : 157 - 162
  • [24] Digraph states and their neural network representations
    Yang, Ying
    Cao, Huaixin
    CHINESE PHYSICS B, 2022, 31 (06)
  • [25] Confidence bounds for neural network representations
    Shao, R
    Martin, EB
    Zhang, J
    Morris, AJ
    COMPUTERS & CHEMICAL ENGINEERING, 1997, 21 : S1173 - S1178
  • [26] Similarity and Matching of Neural Network Representations
    Csiszarik, Adrian
    Korosi-Szabo, Peter
    Matszangosz, Akos K.
    Papp, Gergely
    Varga, Daniel
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [27] Confidence bounds for neural network representations
    Shao, Rui
    Martin, E.B.
    Zhang, J.
    Morris, A. Julian
    Computers and Chemical Engineering, 1997, 21 (SUPPL. 1):
  • [28] Digraph states and their neural network representations
    杨莹
    曹怀信
    Chinese Physics B, 2022, (06) : 220 - 228
  • [29] Similarity of Neural Network Representations Revisited
    Kornblith, Simon
    Norouzi, Mohammad
    Lee, Honglak
    Hinton, Geoffrey
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [30] Aesthetics and neural network image representations
    Janik, Romuald A. A.
    SCIENTIFIC REPORTS, 2023, 13 (01)