Factorised representations for neural network adaptation to diverse acoustic environments

被引：4

作者：

Fainberg, Joachim ^{[1
]}

Renals, Steve ^{[1
]}

Bell, Peter ^{[1
]}

机构：

[1] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh, Midlothian, Scotland

来源：

18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION | 2017年

基金：

欧盟地平线“2020”;

关键词：

speech recognition; adaptation; acoustic factorisation; i-vectors; deep neural networks;

D O I：

10.21437/Interspeech.2017-1365

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Adapting acoustic models jointly to both speaker and environment has been shown to be effective. In many realistic scenarios, however, either the speaker or environment at test time might be unknown, or there may be insufficient data to learn a joint transform. Generating independent speaker and environment transforms improves the match of an acoustic model to unseen combinations. Using i-vectors, we demonstrate that it is possible to factorise speaker or environment information using multi-condition training with neural networks. Specifically, we extract bottleneck features from networks trained to classify either speakers or environments. We perform experiments on the Wall Street Journal corpus combined with environment noise from the Diverse Environments Multichannel Acoustic Noise Database. Using the factorised i-vectors we show improvements in word error rates on perturbed versions of the eval92 and dev93 test sets, both when one factor is missing and when the factors am seen but not in the desired combination.

引用

页码：749 / 753

页数：5

共 50 条

[31] Aesthetics and neural network image representations
Romuald A. Janik
Scientific Reports, 13
[32] Testing paradigms for assistive hearing devices in diverse acoustic environments
Charan, Ram M. C.
Ali, Hussnain
Hansen, John H. L.
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1686 - 1690
[33] Exploring GMM-derived Features for Unsupervised Adaptation of Deep Neural Network Acoustic Models
Tomashenko, Natalia
Khokhlov, Yuri
Larcher, Anthony
Esteve, Yannick
SPEECH AND COMPUTER, 2016, 9811 : 304 - 311
[34] GMM-derived features for effective unsupervised adaptation of deep neural network acoustic models
Tomashenko, Natalia
Khokhlov, Yuri
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2882 - 2886
[35] Statistical and Neural Network Based Speech Activity Detection in Non-Stationary Acoustic Environments
Heitkaemper, Jens
Schmalenstroeer, Joerg
Haeb-Umbach, Reinhold
INTERSPEECH 2020, 2020, : 2597 - 2601
[36] Development of a Synthetic Database for Compact Neural Network Classification of Acoustic Scenes in Dementia Care Environments
Copiaco, Abigail
Ritz, Christian
Fasciani, Stefano
Abdulaziz, Nidhal
2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 1202 - 1209
[37] Dynamic adaptation of network connections in mobile environments
Hansen, JS
Reich, T
Andersen, B
Jul, E
IEEE INTERNET COMPUTING, 1998, 2 (01) : 39 - 48
[38] Simulation framework of ubiquitous network environments for designing diverse network robots
Cho, Seoungjae
Fong, Simon
Park, Yong Woon
Cho, Kyungeun
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2017, 76 : 468 - 473
[39] FACTORIZED ADAPTATION FOR DEEP NEURAL NETWORK
Li, Jinyu
Huang, Jui-Ting
Gong, Yifan
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[40] NEURAL NETWORK MODELS OF LEARNING AND ADAPTATION
DENKER, JS
PHYSICA D-NONLINEAR PHENOMENA, 1986, 22 (1-3) : 216 - 232

← 1 2 3 4 5 →