Factorised representations for neural network adaptation to diverse acoustic environments

被引:4
|
作者
Fainberg, Joachim [1 ]
Renals, Steve [1 ]
Bell, Peter [1 ]
机构
[1] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh, Midlothian, Scotland
来源
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION | 2017年
基金
欧盟地平线“2020”;
关键词
speech recognition; adaptation; acoustic factorisation; i-vectors; deep neural networks;
D O I
10.21437/Interspeech.2017-1365
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Adapting acoustic models jointly to both speaker and environment has been shown to be effective. In many realistic scenarios, however, either the speaker or environment at test time might be unknown, or there may be insufficient data to learn a joint transform. Generating independent speaker and environment transforms improves the match of an acoustic model to unseen combinations. Using i-vectors, we demonstrate that it is possible to factorise speaker or environment information using multi-condition training with neural networks. Specifically, we extract bottleneck features from networks trained to classify either speakers or environments. We perform experiments on the Wall Street Journal corpus combined with environment noise from the Diverse Environments Multichannel Acoustic Noise Database. Using the factorised i-vectors we show improvements in word error rates on perturbed versions of the eval92 and dev93 test sets, both when one factor is missing and when the factors am seen but not in the desired combination.
引用
收藏
页码:749 / 753
页数:5
相关论文
共 50 条
  • [31] Aesthetics and neural network image representations
    Romuald A. Janik
    Scientific Reports, 13
  • [32] Testing paradigms for assistive hearing devices in diverse acoustic environments
    Charan, Ram M. C.
    Ali, Hussnain
    Hansen, John H. L.
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1686 - 1690
  • [33] Exploring GMM-derived Features for Unsupervised Adaptation of Deep Neural Network Acoustic Models
    Tomashenko, Natalia
    Khokhlov, Yuri
    Larcher, Anthony
    Esteve, Yannick
    SPEECH AND COMPUTER, 2016, 9811 : 304 - 311
  • [34] GMM-derived features for effective unsupervised adaptation of deep neural network acoustic models
    Tomashenko, Natalia
    Khokhlov, Yuri
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2882 - 2886
  • [35] Statistical and Neural Network Based Speech Activity Detection in Non-Stationary Acoustic Environments
    Heitkaemper, Jens
    Schmalenstroeer, Joerg
    Haeb-Umbach, Reinhold
    INTERSPEECH 2020, 2020, : 2597 - 2601
  • [36] Development of a Synthetic Database for Compact Neural Network Classification of Acoustic Scenes in Dementia Care Environments
    Copiaco, Abigail
    Ritz, Christian
    Fasciani, Stefano
    Abdulaziz, Nidhal
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 1202 - 1209
  • [37] Dynamic adaptation of network connections in mobile environments
    Hansen, JS
    Reich, T
    Andersen, B
    Jul, E
    IEEE INTERNET COMPUTING, 1998, 2 (01) : 39 - 48
  • [38] Simulation framework of ubiquitous network environments for designing diverse network robots
    Cho, Seoungjae
    Fong, Simon
    Park, Yong Woon
    Cho, Kyungeun
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2017, 76 : 468 - 473
  • [39] FACTORIZED ADAPTATION FOR DEEP NEURAL NETWORK
    Li, Jinyu
    Huang, Jui-Ting
    Gong, Yifan
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [40] NEURAL NETWORK MODELS OF LEARNING AND ADAPTATION
    DENKER, JS
    PHYSICA D-NONLINEAR PHENOMENA, 1986, 22 (1-3) : 216 - 232