Factorised representations for neural network adaptation to diverse acoustic environments

被引:4
|
作者
Fainberg, Joachim [1 ]
Renals, Steve [1 ]
Bell, Peter [1 ]
机构
[1] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh, Midlothian, Scotland
来源
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION | 2017年
基金
欧盟地平线“2020”;
关键词
speech recognition; adaptation; acoustic factorisation; i-vectors; deep neural networks;
D O I
10.21437/Interspeech.2017-1365
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Adapting acoustic models jointly to both speaker and environment has been shown to be effective. In many realistic scenarios, however, either the speaker or environment at test time might be unknown, or there may be insufficient data to learn a joint transform. Generating independent speaker and environment transforms improves the match of an acoustic model to unseen combinations. Using i-vectors, we demonstrate that it is possible to factorise speaker or environment information using multi-condition training with neural networks. Specifically, we extract bottleneck features from networks trained to classify either speakers or environments. We perform experiments on the Wall Street Journal corpus combined with environment noise from the Diverse Environments Multichannel Acoustic Noise Database. Using the factorised i-vectors we show improvements in word error rates on perturbed versions of the eval92 and dev93 test sets, both when one factor is missing and when the factors am seen but not in the desired combination.
引用
收藏
页码:749 / 753
页数:5
相关论文
共 50 条
  • [41] Diverse, Neural Trojan Resilient Ecosystem of Neural Network IP
    Olney, Brooks
    Karam, Robert
    ACM JOURNAL ON EMERGING TECHNOLOGIES IN COMPUTING SYSTEMS, 2022, 18 (03)
  • [42] Accumulation of proline analogues and adaptation of Melaleuca species to diverse environments in Australia
    Naidu, BP
    Paleg, LG
    Jones, GP
    AUSTRALIAN JOURNAL OF BOTANY, 2000, 48 (05) : 611 - 620
  • [43] Transformations of neural representations in a social behaviour network
    Yang, Bin
    Karigo, Tomomi
    Anderson, David J.
    NATURE, 2022, 608 (7924) : 741 - +
  • [44] Transformations of neural representations in a social behaviour network
    Bin Yang
    Tomomi Karigo
    David J. Anderson
    Nature, 2022, 608 : 741 - 749
  • [45] Representations in neural network based empirical potentials
    Cubuk, Ekin D.
    Malone, Brad D.
    Onat, Berk
    Waterland, Amos
    Kaxiras, Efthimios
    JOURNAL OF CHEMICAL PHYSICS, 2017, 147 (02):
  • [46] Neural network representations of multiphase Equations of State
    Kevrekidis, George A.
    Serino, Daniel A.
    Kaltenborn, M. Alexander R.
    Gammel, J. Tinka
    Burby, Joshua W.
    Klasky, Marc L.
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [47] On the Use of Gaussian Mixture Model Framework to Improve Speaker Adaptation of Deep Neural Network Acoustic Models
    Tomashenko, Natalia
    Khokhlov, Yuri
    Esteve, Yannick
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3788 - 3792
  • [48] Learning Neural Representations for Network Anomaly Detection
    Van Loi Cao
    Nicolau, Miguel
    McDermott, James
    IEEE TRANSACTIONS ON CYBERNETICS, 2019, 49 (08) : 3074 - 3087
  • [49] A Study on Deep Neural Network Acoustic Model Adaptation for Robust Far-field Speech Recognition
    Mirsamadi, Seyedmahdad
    Hansen, John H. L.
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2430 - 2434
  • [50] PDAFAI with an Neural Network Acoustic Emulator
    Krout, David W.
    2018 21ST INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION), 2018, : 767 - 771