EXTRACTING DEEP BOTTLENECK FEATURES USING STACKED AUTO-ENCODERS

被引:0
|
作者
Gehring, Jonas [1 ]
Miao, Yajie [2 ]
Metze, Florian [2 ]
Waibel, Alex [1 ,2 ]
机构
[1] Karlsruhe Inst Technol, Interact Syst Lab, D-76021 Karlsruhe, Germany
[2] Carnegie Mellon Univ, Language Technol Inst, Pittsburgh, PA 15213 USA
关键词
Bottleneck features; Deep learning; Auto-encoders; NETWORKS;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this work, a novel training scheme for generating bottleneck features from deep neural networks is proposed. A stack of denoising auto-encoders is first trained in a layer-wise, unsupervised manner. Afterwards, the bottleneck layer and an additional layer are added and the whole network is fine-tuned to predict target phoneme states. We perform experiments on a Cantonese conversational telephone speech corpus and find that increasing the number of auto-encoders in the network produces more useful features, but requires pre-training, especially when little training data is available. Using more unlabeled data for pre-training only yields additional gains. Evaluations on larger datasets and on different system setups demonstrate the general applicability of our approach. In terms of word error rate, relative improvements of 9.2% (Cantonese, ML training), 9.3% (Tagalog, BMMI-SAT training), 12% (Tagalog, confusion network combinations with MFCCs), and 8.7% (Switchboard) are achieved.
引用
收藏
页码:3377 / 3381
页数:5
相关论文
共 50 条
  • [41] Feature Selection using Multiple Auto-Encoders
    Guo, Xinyu
    Minai, Ali A.
    Lu, Long J.
    2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 4602 - 4609
  • [42] Transfer learning with deep manifold regularized auto-encoders
    Zhu, Yi
    Wu, Xindong
    Li, Peipei
    Zhang, Yuhong
    Hu, Xuegang
    NEUROCOMPUTING, 2019, 369 : 145 - 154
  • [43] Speech Emotion Recognition Integrating Paralinguistic Features and Auto-encoders in a Deep Learning Model
    Fonnegra, Ruben D.
    Diaz, Gloria M.
    HUMAN-COMPUTER INTERACTION: THEORIES, METHODS, AND HUMAN ISSUES, HCI INTERNATIONAL 2018, PT I, 2018, 10901 : 385 - 396
  • [44] Stacked Progressive Auto-Encoders (SPAE) for Face Recognition Across Poses
    Kan, Meina
    Shan, Shiguang
    Chang, Hong
    Chen, Xilin
    2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 1883 - 1890
  • [45] Nonrigid 3D shape retrieval using deep auto-encoders
    Hamed Ghodrati
    A. Ben Hamza
    Applied Intelligence, 2017, 47 : 44 - 61
  • [46] Robust color image hashing using convolutional stacked denoising auto-encoders for image authentication
    Madhumita Paul
    Arnab Jyoti Thakuria
    Ram Kumar Karsh
    Fazal Ahmed Talukdar
    Neural Computing and Applications, 2021, 33 : 13317 - 13331
  • [47] On the Quality of Deep Representations for Kepler Light Curves Using Variational Auto-Encoders
    Mena, Francisco
    Olivares, Patricio
    Bugueno, Margarita
    Molina, Gabriel
    Araya, Mauricio
    SIGNALS, 2021, 2 (04): : 706 - 728
  • [48] LEARNING DEEP REPRESENTATIONS USING CONVOLUTIONAL AUTO-ENCODERS WITH SYMMETRIC SKIP CONNECTIONS
    Dong, Jian-Feng
    Gan, Yuan-Zhu
    Mao, Xiao-Jiao
    Yang, Yu-Bin
    Shen, Chunhua
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 3006 - 3010
  • [49] Robust color image hashing using convolutional stacked denoising auto-encoders for image authentication
    Paul, Madhumita
    Thakuria, Arnab Jyoti
    Karsh, Ram Kumar
    Talukdar, Fazal Ahmed
    NEURAL COMPUTING & APPLICATIONS, 2021, 33 (20): : 13317 - 13331
  • [50] Artificial Bandwidth Extension with Memory Inclusion using Semi-supervised Stacked Auto-encoders
    Bachhav, Pramod
    Todisco, Massimiliano
    Evans, Nicholas
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1185 - 1189