Factorised representations for neural network adaptation to diverse acoustic environments

被引:4
|
作者
Fainberg, Joachim [1 ]
Renals, Steve [1 ]
Bell, Peter [1 ]
机构
[1] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh, Midlothian, Scotland
来源
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION | 2017年
基金
欧盟地平线“2020”;
关键词
speech recognition; adaptation; acoustic factorisation; i-vectors; deep neural networks;
D O I
10.21437/Interspeech.2017-1365
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Adapting acoustic models jointly to both speaker and environment has been shown to be effective. In many realistic scenarios, however, either the speaker or environment at test time might be unknown, or there may be insufficient data to learn a joint transform. Generating independent speaker and environment transforms improves the match of an acoustic model to unseen combinations. Using i-vectors, we demonstrate that it is possible to factorise speaker or environment information using multi-condition training with neural networks. Specifically, we extract bottleneck features from networks trained to classify either speakers or environments. We perform experiments on the Wall Street Journal corpus combined with environment noise from the Diverse Environments Multichannel Acoustic Noise Database. Using the factorised i-vectors we show improvements in word error rates on perturbed versions of the eval92 and dev93 test sets, both when one factor is missing and when the factors am seen but not in the desired combination.
引用
收藏
页码:749 / 753
页数:5
相关论文
共 50 条
  • [1] Feature Based Domain Adaptation for Neural Network Language Models with Factorised Hidden Layers
    Hentschel, Michael
    Delcroix, Marc
    Ogawa, Atsunori
    Iwata, Tomoharu
    Nakatani, Tomohiro
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2019, E102D (03) : 598 - 608
  • [2] Factorised Hidden Layer Based Domain Adaptation for Recurrent Neural Network Language Models
    Hentschel, Michael
    Delcroix, Marc
    Ogawa, Atsunori
    Iwata, Tomoharu
    Nakatani, Tomohiro
    2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1940 - 1944
  • [3] DISCRIMINATIVELY TRAINED JOINT SPEAKER AND ENVIRONMENT REPRESENTATIONS FOR ADAPTATION OF DEEP NEURAL NETWORK ACOUSTIC MODELS
    Yin, Maofan
    Sivadas, Sunil
    Yu, Kai
    Ma, Bin
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5065 - 5069
  • [4] <monospace>CamoNet</monospace>: On-Device Neural Network Adaptation With Zero Interaction and Unlabeled Data for Diverse Edge Environments
    Zhang, Zhengyuan
    Zhao, Dong
    Liu, Renhao
    Tian, Kuo
    Yao, Yuxing
    Li, YuanChun
    Ma, Huadong
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2024, 23 (12) : 11483 - 11497
  • [5] Acoustic and temporal representations in convolutional neural network models of prosodic events
    Stehwien, Sabrina
    Schweitzer, Antje
    Vu, Ngoc Thang
    SPEECH COMMUNICATION, 2020, 125 (125) : 128 - 141
  • [6] Context Adaptive Neural Network Based Acoustic Models for Rapid Adaptation
    Delcroix, Marc
    Kinoshita, Keisuke
    Ogawa, Atsunori
    Huemmer, Christian
    Nakatani, Tomohiro
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (05) : 895 - 908
  • [7] Subspace LHUC for Fast Adaptation of Deep Neural Network Acoustic Models
    Samarakoon, Lahiru
    Sim, Khe Chai
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 1593 - 1597
  • [8] Domain Adaptation Neural Network for Acoustic Scene Classification in Mismatched Conditions
    Wang, Rui
    Wang, Mou
    Zhang, Xiao-Lei
    Rahardja, Susanto
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 1501 - 1505
  • [9] TOWARDS UTTERANCE-BASED NEURAL NETWORK ADAPTATION IN ACOUSTIC MODELING
    Himawan, Ivan
    Motlicek, Petr
    Font, Marc Ferras
    Madikeri, Srikanth
    2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 289 - 295
  • [10] Recurrent selection in oat for adaptation to diverse environments
    J.B. Holland
    Å. Bjørnstad
    K.J. Frey
    M. Gullord
    D.M. Wesenberg
    T. Buraas
    Euphytica, 2000, 113 : 195 - 205